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CHAPTER  1.  INTRODUCTION,  SUMMARY  AND  RELATED  WORK 


1 . 1  Introduction 

Regression  analysis  is  concerned  with  the  study  of  the  relationship 
between  a  response  variable  Y  and  a  set  of  predictor  variables 

X  =  (X^.X, . Xp).  An  important  aspect  of  regression  analysis  is  the 

estimation  of  the  regression  function,  i.e.  of  the  conditional  expecta¬ 
tion  of  Y,  given  X.  In  classical  regression  analysis,  the  functional 
form  of  the  regression  function  is  assumed  to  be  known  up  to  a  finite  set 
of  unknown  parameters,  which  may  be  estimated  from  data. 

If  no  such  prior  knowledge  of  the  regression  function  exists,  then 
classical  methods  do  not  apply.  However,  in  this  case,  it  may  still  be 
desirable  to  obtain  an  estimate  of  the  regression  function,  either  for 
direct  analysis  or  to  establish  a  plausible  model  for  use  in  the  classical 
regression  analysis  mentioned  above. 

Thus  there  is  a  need  for  regression  analysis  methods  which  do  not 
assume  a  specific  mathematical  form  for  the  regression  function,  i.e. 
nonparametric  methods. 

In  this  study,  a  type  of  nonparametric  estimator  of  the  regression 
function  m(x)  =  R[Y|X*x]  will  be  investigated,  where  (X,Y)  is  a  bivariate 
random  vector. 

Let  X  and  Y  be  random  variables  defined  on  a  probability  space 
(J1,F,P)  with  li | Y |  <  ■».  Denote  the  marginal  distribution  function  of  X 
by  F.  Then  the  regression  function  m(x)  is  defined  by 


m(x)  *  1-1 V| X-x J ,  1.0.  the  (unique  a.e.  (dF))  Borel  measurable  function 
m  satisfying 

( 1 • 1 • 1 )  /  YdP  -  /  m(x)dF(x) 

X_1B  B 

tor  all  Borcl  sets  B.  If  X  and  Y  have  a  joint  density  function  f,  then 
it  follows  that 


(1.1.2)  n ( x  j 


CO 

I  yf(x,y)dy 

- - if  f(x)  >  0 

f(x) 

0  if  f(x)  =*  0 


is  a  version  of  the  regression  function,  where  f  denotes  the  marginal 
density  ot  X.  Motivated  hy  (1.1.2)  and  previous  work  on  estimation  of 
density  functions  hy  5 -function  sequences,  Watson  (1%4)  suggested  an 
estimator  of  m(x)  of  the  tom 

CWZ  Yi6n(x  Xi) 

(1.1.X)  n^lx)  =  - - - 

(1/n)  l  S  (x-X.) 
j-1  J 

where  (Xj.)jl,  (X,,)  ,),..., (X^,)^)  are  independent  observations  on  (X.Y) 
and  {yx)}  is  a  sequence  of  weighting  functions  called  a  6-function 
sequence.  The  estimator  m^x)  defined  in  (1.1.3)  will  be  investigated 
here.  By  rewriting  (1.1.3)  as 


"L  (x)  «  l  Y. 

n  i-l  1 


VX*V 


1  ^n(x-XJ 
i-1  n  J 


we  have  the  intuitively  appealing  interpretation  of  n^fx)  as  a  weighted 
average  of  the  Y -observations,  with  the  weights  depending  on  x  through 
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Sn(x).  Also,  if  one  desires  a  smooth  estimate  of  m(x) ,  this  can  be 
achieved  through  the  choice  of  Sn(x). 

In  certain  situations,  the  marginal  density  f  of  X  is  known.  For 
example,  suppose  in  an  experiment ,  we  are  able  to  fix  the  level  of  the 
predictor  variable  X,  but  we  wish  to  randomize  X  to  reduce  sampling  bias. 
Then  we  would  choose  X  randomly  according  to  a  known  density  f.  This 
situation  also  arises  in  certain  optimization  problems,  where  the  value 
of  the  function  to  be  optimized  can  only  be  determined  up  to  a  random 
error  term  (see  IVvroye  (1978)).  Since  the  denominator  of  (1.1.3)  is 
intended  to  est invite  f,  a  reasonable  way  to  use  the  knowledge  of  f  might 
be  to  use  the  modified  estimator 

(1/n)  l  Y  6  (x-xp 
i-1 


mn(x) 


f(x) 


We  provide  some  preliminary  comparisons  of  the  estimators  and  m^  in 
the  known  density  case. 


1 . 2  Summary 

Since  we  will  assume  a  specific  mathematical  form  for  neither  the 
regression  function  m  nor  the  underlying  probability  distribution  of 
(X , Y) ,  we  could  not  reasonably  expect  to  obtain  small  sample  results  for 
the  estimators  in  question.  Thus  we  shall  concern  ourselves  here  almost 
exclusively  with  asymptotic  results,  as  the  sample  size  grows  larger. 

In  Chapter  2,  we  rigorously  establish  (weak)  pointwide  consistency 
of  mn(x),  which  was  proved  heurist ically  by  Watson  (19b4).  Asymptotic 
.joint  normality  of  ny^x) ,  taken  at  a  finite  lumber  of  points,  is  demon¬ 
strated.  In  this  last  result,  we  significantly  weaken  a  condition  of 
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Schuster  (1972)  on  the  6 -function  sequence  used,  ;it  the  expense  of 
some  mild  additional  regularity  conditions.  In  the  known  density  case, 
we  establish  consistency  and  asymptotic  normality  for  in  ,  so  as  to  provide 
a  comparison  with  the  asymptotic  normality  and  consistency  results  for  . 
We  consider  the  mean  integrated  square  error  (M1SE)  of  the  numerator  of 
the  estimator  n^.  An  explicit  expression  for  the  Fourier  transform  of  the 
6- function  which  minimizes  this  MISF.  for  each  sample  size  n  is  derived, 
much  as  Watson  and  Leadhetter  (1%3)  did  for  density  function  estimators. 

In  Chapter  3,  we  consider  the  numerator  of  inn ,  and  show  that  t ho 
supremum,  taken  over  a  finite  interval,  if  properly  centered  and  normalized, 
converges  in  distribution  to  a  random  variable  having  an  extreme  value  tvpe 
distribution.  This  result  is  then  applied  to  establish  uniform  (weak) 
consistency  of  in  ,  with  an  associated  rate  of  convergence. 

In  Chapter  4,  we  give  some  examples  of  calculations  of  the  estimators 

m  and  m  from  simulated  data, 
n  n 


L 


1 .3  Related  Work 

Estimators  of  the  form  ( 1.1.3)  of  the  regression  function  and  several 
other  types  of  nonparametr ic  estimators  of  the  regression  function  have 
recently  received  attention  in  the  literature.  Here  we  survey  the 
recent  literature  on  this  subject. 

1.3.1  kernel  Type  Estimators 

Several  authors  have  considered  estimators  of  the  form  (1.1.31  when 
the  3-function  sequence  is  of  kerne l  type.  Kernel  type  3 -function  seq¬ 
uences  (defined  rigorously  in  forma  2.1.21  are  of  the  form 

Vx)  "  fnI^x/fn)* 

where  K  is,  e.g.  a  probability  density  function  and  r  is  a  positive  real 
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sequence  with  c  -*  0  as  n  -*■  ». 

Schuster  (1972)  considers  the  asymptotic  normality  of  this  type  of 
estimator.  Because  of  the  close  relation  between  Schuster’s  work  and 
work  presented  in  this  dissertation,  we  defer  discussion  until  Section 
2.4. 

Schuster  and  Yokowitz  (1978)  consider  a  global  error  criterion  for 
this  type  of  estimator,  and  for  its  derivatives  as  estimators  of  the 
derivatives  of  the  regression  function.  Let  g*-1^  denote  the  r-th 
derivative  of  the  function  g.  Schuster  and  Yokowitz  give  conditions  under 
which,  for  any  c  >  0  and  n  sufficiently  large, 


(1.3.1) 


P[  sup  |  m^(x)  -  m^(x)|  s  0] 
avxsb  n 


S 


C/(ne 


2N+2  2^ 
n  e  ) 


9 


where  N  is  a  positive  integer,  [a,b]  is  a  closed,  bounded  interval  and 
C  is  a  constant  not  depending  on  n.  If  (cn)  is  such  that  -*■  <® 

as  n  ■*  »,  then  (1.3.1)  implies  that 

sup  |mi^  (x)  -  mlN^  (x)  |  5  0  , 
asxsb 

as  n  ■*  »,  so  that  this  result  is  a  type  of  uniform  consistency  result. 

It  would  be  of  interest  to  determine  a  rate  of  convergence  to  be  assoc¬ 
iated  with  the  result,  i.e.,  a  positive,  real  sequence  (bn>  with 

bn  -*•  *>  such  that 

hn  supl^Ux)  -  m(N\x)|  £  0  . 

Tins  question  is  addressed  in  Chapter  3  of  this  dissertation  for  the  case 
N  *  0.  Schuster  and  Yakowitz  also  consider  the  case  where  the  X 
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variable  is  non-stochastic,  i.e.,  wv  have  (F(*;x),  x«  [0, 1 ] >  as  a  family 
of  probability  density  functions  and  the  object  is  to  estimate 


w(x)  -  JyF(y;x)dy 


on  the  basis  of  an  independent  sample  Y^,  i  -  l,...,n  where  Y-  has 
density  f(*;x;).  They  give  conditions  mxier  which  a  result  similar  to 
(1.3.1)  holds  for  the  so-called  Priest lv-Chao  estimator 


(1.3.:)  W  (x)  =  f’1  J  Y^X.-X.  )K((X-X.)/C  ) 

i=l 


of  w(x)  (see  Priestly  aixl  Chao  (1972)). 

In  the  non-stochast ic  X  variable  case  as  described  above,  Benedetti 
(1974)  shows  that  both  the  Watson  estimator  and  the  Priest lv-Chao 
estimator  are  (weakly)  consistent  and  asymptot ical ly  normal  for  appro¬ 
priate  values  of  x,  but  he  points  out  some  computational  advantages  of 
the  Watson  estimator  over  the  Priestly -Chao. 

Konakov  (1975)  considers  a  quadratic  deviation  error  criterion  for 
the  Watson  estimator  with  kernel  tvpe  5-sequences.  IV fine  the  quadratic 
deviation  to  be 

Tn  ■  /  (mn(x)-m(x)rf^(x)plx)dx 


where  f  is  a  kernel  type  estimator  of  the  marginal  density  f  of  \  and 
p  is  a  hounded  integrable  weight  function.  Konakov  gives  conditions 
under  which  T  ,  if  properly  normalized  and  centered,  is  asymptotically 
normal.  We  do  not  consider  quadratic  deviation  in  this  dissertation. 


1.3.2  Nearest  Neighbor  Type  Estimators. 

Watson  ( 19M)  proposed  estimating  m(x)  with  the  average  of  the  Y 
values  corresponding  to  t lie  k  X  values  nearest  to  x,  where  k  is  some 
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integer.  This  type  of  estimator  is  called  the  k  nearest  neighbor 
estimator  of  the  regression  function.  Earlier  work  had  been  done  on 
the  classification  problem  and  on  estimation  of  a  probability  density 
function  using  nearest  neighbor  techniques.  (See  Fix  and  Hodges  (1951), 
Cover  and  Hart  (1967),  Cover  (1968),  and  Loftgaarden  and  Quesenberry 
(1965)  for  work  in  these  areas.) 


Let  k(n)  be  an  integer  depending  on  the  sample  size  n  and  denote 
bv  the  smallest  open  interval  centered  at  x  containing  no  less 

than  k  of  the  X-observations.  Then  the  k-nearest  neighbor  estimator 
n^,  can  be  written  as 


(1.3.3) 


iiL(x)  ■  k'1  J  Y.  . 


Stone  (1977)  points  out  that  m^fx)  may  be  a  discontinuous  function,  and 
that  in  some  cases,  smoothness  is  a  desirable  property  in  a  regression 
function  estimator.  Lai  (1977)  proposes  a  modification  of  the  k  nearest 
neighbor  estimator  which  can  have  the  desired  smoothness  property.  This 
estimator  is  very  similar  to  the  Watson  estimator  (1.1.3)  with  kernel 
type  6-function  sequence.  Let  W  be  a  probability  density  with  W(x)  =  0 
for  | x |  >  1.  Then  Lai’s  estimator  is  defined  by 


(1.3.4) 


mn(x) 


^,lYiWC(x-Xi)/Rk(„)(x)) 

i;.1W((x-Xi)/Rkfn)(x)) 


where  is  the  radius  of  the  interval  1^^.  This  estimator  reduces 

to  the  form  (1.3.3)  when  W(x)  *  1/2  for  |x|  s  1.  Lai  proves  the  following. 


1.3.1  Theorem.  Assune  W  is  continuous  a.e.,  bounded  and  W(x)  =  0  for 
| x J  >  1 .  If  there  exists  an  open  set  Uq  in  Ron  which 

i)  f(x)  is  continuous,  bounded,  and  f(x)  >  0  , 
ii)  E((|Y||X*x)  and  E(max(Y,0) |X*x)  are  continuous  functions  of  x, 
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iii)  limsup  E(|Y|»Iri  .  M)(Y)|X»x|  =  0  ,  and  if  KY~  <  *  and 

XfU0  t|yl 

iv)  k(n)/n  -*  0  and  k(n)/»5T  -*  «>, 

then 

sup | m  U)  -  m(z)  |  -  0 
z<A 

in  probability  for  any  compact  set  AcUq. 

A  similar  result  is  proved  for  the  estimator  (1.1.  ),  with  A  ;in 
interval,  in  Chapter  3  of  this  dissertation.  There,  more  regularity 
conditions  are  applied  to  obtain  an  associated  rate  of  convergence. 

Stone  (1977)  considers  the  following  type  of  nonparametric  regres¬ 
sion  function  estimator: 

(1.3.5)  i^(x)  =  l  Wni(x)Yi 

where  W^(x)  =  wni^x»  is  a  weight  function.  This  estimator 

reduces  to  the  nearest  neighbor,  modified  nearest  neighbor  and  5-function 
type  estimators  discussed  above  for  appropriate  choices  of  the  function 
Wni*  Stone  gives  general  conditions  on  the  weight  functions  for 
to  be  consistent  in  Lr,  i.e.,  for 

E (xl  -  m(x)  |r  -*■  0 

whenever  E | Y | r  <  <®.  Stone's  work  applies  to  give  minimal  conditions  for 
this  type  of  consistency  for  some  types  of  estimators,  e.g.,  if  k(n)  -*•  <*> 
and  k(n)/n  -*  0,  then  the  k  nearest  neighbor  estimator  is  consistent  in 
L  .  Stone,  however,  points  out  that  it  is  not  clear  from  his  results 
when  an  estimator  of  the  Watson  type  is  consistent  in  Lr. 
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— 


1.3.3  Potential  Function  Methods. 

In  this  method,  introduced  by  Aizeman,  Braverman,  and  Rozonoer 

(1970),  the  regression  function  m  is  assumed  to  belong  to  a  Hilbert 

space  H  and  have  the  representation  m(x)  =  [C^.(x)  ,  where  {<^}  is 

i 

a  complete  system  of  functions  of  H.  The  estimator  is  calculated 
recursively  by  the  formula 

Vx)  =  Vl(x)  +  rmK(x’V  * 

where 

rn  =  VYn  -  mn-l(Xn^  ’ 
and  K  is  a  "potential  function"  of  the  form 

K(x,y)  =  ^-^(x^Cy)  , 
i 

and  {yn}  is  a  sequence  of  real  nunbers,  and  is  chosen  arbitrarily. 

We  have  the  following  type  of  consistency  for  this  set-up.  Suppose 

^i(Ci/Xi)2<  °°*  ^i  Yi  =  “»  ^i  Yi  <  °°  •  111611 

/  [mn(x)  -  m(x)]“f(x)dx  *  0 

in  probability  as  n  +  “. 

Fisher  and  Yokowitz  (1976)  obtain  more  general  results  for  this 
type  of  estimation,  but  for  a  more  complicated  error  criterion. 

1.3.4  Estimates  Based  on  Ranks. 

Let  X  .  s  X  ,  <  .  .  .  <  X  denote  the  ordered  X  values  and  define 
Tu  n*.  nn 

the  concomitant  of  X^  *  Xj  to  be  =  Y  • .  The  set  Y^,  i  =  l,...,n, 
are  sometimes  called  the  induced  order  statistics.  Yang  (1977)  proposes 
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the  following  estimator  of  m(x)  based  on  concomitants: 

V*)  •  (-V'1  -  F„w>/s>Y„i 

where  en^K(x/en)  is  a  kernel  type  6 -function  sequence  and  Ffi  is  the 
empirical  distribution  function  of  the  X  values.  Yang  gives  conditions 
under  which  is  (weakly)  consistent  and  asymptotically  normal  at  appro¬ 
priate  points  x. 

Bhattacharya  (1976)  discusses  estimation  of  a  function  related  to  the 
regression  function  based  on  concomitants.  Let  F  denote  the  marginal 
distribution  function  of  X  and  define 


h(t)  =  m°F 


t 

(t);  H(t)  =  J  h(s)ds 
0 


Natural  estimators  of  H  are 


0  s  t  s  1  . 


and 


H  (t)  =  n 
nv  ’ 


_i [nt] 

1 1  \ 
i=l 


m 


HJ(t) 


l  , 
F<Vit) 


Y  . 
ni 


if  F  is  known.  Bhattacharya  obtains  weak  convergence  in  D [0 , 1 ]  results 
for  these  estimators  and  applies  them  to  estimation  and  hypothesis 
testing  problems. 


c  * 


CONSISTING*.  NORMAL l l Y 


2.  I  iS- function  Sequences 

\  certain  class  of  6  function  sequences  was  suggested  original  lv  In 
Rosenblatt  (IDSM  aiwl ,  under  slight  lv  weaker  conditions,  bv  barren  (IdtO 
lor  use  m  probability  dons t tv  function  estimation,  l.eadbetter  (IDo.n) 
atwl  Watson  and  l.eadbetter  (IDM)  introduced  a  more  general  notion  ol 
iS  timet  ion  sequences,  and  our  approach  throughout  this  stiivlv  will  be 
to  obtain  results  tor  the  more  general  tvpe  of  vt  functions  whenever 
possible. 

The  following  (2.1.1  2.1.41  is  due  to  l.eadbetter  (1%5). 

2.1.1  Definition.  A  sequence  of  integrablo  funct ions  {6  (\)|  is  called 
a  6  time  t  ion  sequence  it  it  satisfies  the  following  set  of  conditions 
(integrals  with  no  limits  ot  integration  are  meant  to  extend  over  the  eti 
tire  real  line): 

Cl.  /|A  (xl|dx  '  A  for  all  n  and  some  fixed  A, 

i'.’.  /«$  (xUlx  -  1  for  all  n, 

I'.t.  A  (x)  »  D  tut i fonn I v  on  |\|  ?  \  for  any  tixed  \  ■*  0, 

t‘4.  /  A  (x)d.\  »  it  for  anv  t  ixed  \  s  0 

|x|.'\  n  11 

Hie  next  lemma  describes  the  tvpe  of  ^  function  sequence  usevi  bv 
Rosenblatt  (l*>So)  and  barren  (l(>b.'),  although  the  conditions  on  the  func 
t  ion  K  a  it'  slight  lv  different.  This  tvpe  ot  Ji  fvutetion  sequence  will  be 


12 

referred  to  as  "kernel  typo"  awl  k  as  a  "kernel  timet  ion." 

2.1.2  Lemma .  Let  { i  f  }  ho  a  sequence  of  non- zero  constants  with  < ^  *■  0 
as  n  *  *«  and  lot  k  bo  an  integrable  function  such  that  /  k(x)dx  «  1 

and  k(x)  *  o(x  as  |x|  »  Then  {»  ^klx/c^) }  is  a  6 -function  sequence 

n 

The  following  l emua  demons t rates  the  similarity  of  <5 -function 
sequences  as  defined  above  and  the  Dirac  6 -timet ion. 

2.1.3  Lemma.  If  g(x)  is  intertable  and  continuous  at  x  »  0  and  i<S  l 

is  a  A- function  sequence,  then  ixl  is  integrable  for  each  n  and 

Jg(x)fi  (x)dx  »  g(lt)  as  n  -*■  ®.  U 

2.1.4  Lemma.  Let  {6  (x)}  be  a  6 -function  sequence  such  that,  for  p  >  l, 
lXn(.P)  =  /  | <5  Lu)  |*  dii  <  »  for  each  n.  Then  ot  l.pl  -*■  «*  and 

{<5n.ptx)}  "  {|Vx),P/Vp)} 

is  a  6- function  sequence  for  p  >  l.  LI 

Rosenblatt  ilD~l>  states  the  following  lemma,  which  gives  a  rate 
of  convergence  for  Lenina  2.1.3,  when  the  6 -function  sequence  is  el 
kernel  type.  We  include  a  proof  for  completeness. 

2.1.5  Lenina.  Suppose  g  is  an  mtegrable  function  with  bounded ,  contin¬ 
uous  1st  and  2nd  derivatives.  Let 

Un(xl)  *  {cnlK(x/cnl) 


be  a  S -function  sequence  of  kernel  type  with 


/  uK(u)du  *  0  and  /|u“klu)|du  <  «  . 

Then 

enV  K((x-u)/en)g(u)du  =■  glx)  +  O(e^)  , 
whore  the  sequence  represent  ed  by  0  does  not  depend  on  x. 

Proof.  Write 

*  n 1  /  K((x-u)/en)*Uu)du  -  /K(y)g(x-eny)dy 
and 

«(x-cny) 

=  g(x)  -  g'(x)eny  ♦  g"(T)e^yV2 
where  r  -  tn(x,y)  is  between  x  and  x-e  y.  Thus 
«  ,’/  j  K((x-u)/en)g(u)du 
=  g (x)/  K(v)dv  -  g'(x)en  /  yK(y)dy 

*  Ce^/2)  /  g"(t )y“kly)dy 

and  hence 

l^1  /  K((x-u)/en)g(u)du  -  g(x)| 

5  ‘n  sup| (g"(t))/2|  /  |yfcK(y)|dy 
t 

since  /  K(u)du  =  1  and  /  uK(u)du  «  0  .  The  conclusion  follows  from 


the  last  inequality. 

The  following  lemmas  will  he  useful  in  the  sequel. 


II 
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2.1.6  Lemma .  Let  {<5^}  be  a  6-function  sequence  and  g  an  integrahlc 
function.  If  g  is  continuous  at  x  and  y  and  x  f  y,  then 
6nU  -  *)6n(y  -  *)gO)  is  integrable  and 

/  6n(x-u)6n(y-u)g(u)du  0 

as  n  *  ®. 


Proof.  For  convenience,  assume  x  <  y.  By  Lemma  2.1.3,  6  (y-u)g(u) 
is  an  integrable  function  for  each  n,  as  is  6ntx-u)g(u).  Choose  X  so 
that  x  <  X  <  y.  Then 

|  /  6n(x-u)6n(y-u)g(u)du| 

X 

s  /  | 6  (x-u)6n(y-u)g(u)|du 

-o o 

oo 

/  |6n(x-u)6n(y-u)g(u)  |du 

*  sup|6  (y-u) |  /  | 6  (x-u)g(u)|du 
usX 

+  sup|6  (x-u)|  /  |6  (y-u)g(u) |du  . 
usX  n  n 

Now  sup|6  (y-u) |  and  sup|6  (x-u)|  converge  to  zero  bv  C3  of  Definition 
usX  U=>X  n 

2.1.1.  Further,  /  |6n(x-u)g(u)  |du  <  for  each  n  by  the  preceding 
remark,  and,  in  fact,  by  Lemmas  2.1.3  and  2.1.4, 

(an(i))  1  j  |6n(x-u)g(u)  |du  |g(x)| 
and 

«nU)  •  /  |<5n(u)|du  <  A 


i 
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lor  sane  constant  A.  Thus  /  | 6n(x-u)g(u) |du  is  a  bounded  sequence, 
and  we  have 

sup | 6  (y-u) |  /  |S  (x-u)g(u) |du  -  0 
usA  n 

as  n  -*■  °°  .  The  same  argument  applies  when  x  and  y  are  interchanged,  ;ind 
the  conclusion  follows.  □ 


2.1.7  Lemma.  Let  {6  }  be  a  6-function  sequence  such  that  6^  is  an 
even  function  for  each  n  and  g  an  integrable  function  such  that  g  has 
both  left  and  right  hand  limits  at  0.  Then  6n(x)g(x)  is  an  integrable 
function  for  each  n  and 


/  6n(x)g(x)dx  -  (g(0+)  +  g(0-))/2 


as  n  ■*  °°. 


Proof.  Define 


8+(x) 


ft'(x) 


g(x)  ,  x  >  0 
g(0+) ,  x  =  0 
g(-x),  x  <  0 

g(-x),  x  >  0 
g(O-),  x  =  0 

g(x)  ,  x  <  0 


Clearly,  g  and  g  are  even  functions,  continuous  at  0.  Further,  they  are 
both  integrable  functions,  since,  e.g. 

oo 

I  |g*(u) |du  *  2  /  | g+ (u) |du 

0 


2  /  |g(u)  |du  <  <» 
0 


lb 

Thus ,  hy  Lemma  2.1.3, 

/  6n(x)g+(x)dx  -  g*(0)  =  g(0+) 
as  n  +  »  .  But 

/  |«n00g(x)  |dx 

oo  0 

=  /  I6 n(x)g(x)|dx  +  /  | <5  (x)g(x)  |dx 

=  (V2)  /  |g+(x)6n(x)  | dx  +  (1/2)  /  |g'(x)6n(x)  |dx  <  « 
by  Lemma  2.1.3,  so  that  6n(x)g(x)  is  integrable,  and 

/  <5n(x)g(x)dx 

oo  0 

=  /  <5  (x)g(x)dx  +  /  6  (x)g(x)dx 

0  -oo  n 

=  (1/2)  /  6n(x)g+(x)dx  ♦  (1/2)  /  6n(x)g'(x)dx 

-  (8(0+)  ♦  g(0-))/2 

by  the  preceding  remark.  q 

2-2  Nonparametric  Density  Function  Estimation. 

Nonparametric  methods  of  density  function  estimation  have  been  studied 
in  great  detail  (see,  e.g.,  Wegman  (1972a)  and  Wegman  (1972b)  for  a  survey 
and  comparison  of  work  in  this  area).  Estimators  of  a  density  f(x)  of  the 
form 

fn(x)  =  (1/n)  l  6  (x-X.) 

i*l  n  1 


(2.2.1) 
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are  of  particular  interest  here  because  of  the  reliance  of  our  proposed 
regression  function  estimators  on  the  same  type  of  weighting  functions 
6^,  and  because  t'n,  defined  in  (2.2.1)  appears  in  the  denominator  of 
as  defined  i n  equat ion  (1.1.3). 

Since  X ^ ,  i  =  l,...,n  are  i.i.d.  with  common  density  f,  we  have 
hfn  -  /  6n(x-u)f(u)du  , 

which  has  f(x)  as  its  limiting  value  as  n  ■+  «■,  provided  f  is  continuous 
at  x,  by  Lemma  2.1.3.  That  is,  f  is  an  asymptotically  unbiased  estima¬ 
tor  of  f  at  continuity  points  of  f.  Further, 

■  (l/nAl  j^tx-xp  •  5nc*-Xj) 

=  (1/n)  /  <5“  (x-u)  f  (u)du 
*  ((n-l)/n) [  /  6n(x-u)f(u)du]"  , 

so  that 

Var[fn(x)]  =  (1/n)  /  6^(x-u)f(u)du 

-  (1/n)  [  /  <Sn(x-u)  f  (u)duj  “ 

;ind  we  thus  have,  if  ■  /  6“(u)du  <  <®  for  each  n,  by  Lemma  2.1.4, 
(n/“n)Var[fn(x) 1  -  f(x) 

as  n  -*■  ®  at  continuity  points  x  of  f.  The  above  calculations  (which 
appear  in  Watson  and  Leadbettcr  (1%4))  may  be  combined  to  give  condi¬ 
tions  under  which  the  mean  square  error  of  fn  converges  to  zero,  as  the 
following  lemma  shows. 
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2.2.1  Lemma.  Let  {6^1  be  a  6 -function  sequence  for  which 

.  i 

<*n  -  /  6n(u)du  <  00  lor  each  n  and  an/n  +  0  as  n  +  ®.  Let  x  be  a 
continuity  point  of  f.  Then 

E[fn00  •  f(x)]2  +  0  asn  +  ®  .  □ 

2.2.2  Remark.  By  Chebychev's  inequality, 

p{|fn(x)-f(x)|  >  c)  s  c'2Etfn(x)-f(x)]2 

lor  ;my  e  >  0.  We  thus  have  f^Lx)  -*■  f(x)  in  probability,  provided  the 
conditions  of  Lemma  2.2.1  are  satisfied.  That  is,  l'n (x)  is  a  weakly 
consistent  estimator  of  f(x)  for  appropriate  6-function  sequences  and 
points  x. 

The  preceding  discussion  on  density  estimation  will  suffice  for  our 
discussion  of  pointwise  consistency  of  our  proposed  regression  estimators. 
We  will  include  other  pertinent  results  on  density  estimation  as  they  are 
needed. 

2.3  Pointwise  Consistency  Properties  of  mn  and  mn. 

We  begin  our  discussion  by  considering  the  numerator  of  the  estima¬ 
tors  and  m^  defined  in  (1.1.3)  and  (1.1.4),  respectively.  Denote,  for 
convenience,  the  numerator  bv  in*,  i.e., 

*  n 

m  (x)  =  (1/nl  l  Yi6  (x-X- )  . 
n  i-i  1  n  1 

Then  we  have  the  following. 
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2.3.1  Lemma .  Let  { 6^}  be  a  6- function  sequence  for  which 
2  2 

°n  *  ‘  <5^(u)du  v  <»  for  each  n.  Suppose  EY  <  «>  and  x  is  a  point  of  con¬ 
tinuity  of  the  functions  f(u),  m(u)  *  E[Y|X»u]  ami  s(u)  *  E[Y~|X=u). 

Then 

(i)  Em*(x)  -  m(x)f(x) 

(ii)  (n/at^Varfm^x)  ]  -*  s(x)f(x) 
as  n  ♦  ». 


Proof.  We  will  use  the  following  two  well  known  properties  of  the 
regression  function: 

(2.3.1)  F.h(Y)  =  /  E[h(Y)|X»x]f(x)dx 

for  any  function  h  and  random  variable  Y  such  that  E|h(Y)|  <  ®  , 


(2.3.2)  E[g(X)h(Y) |X=x]  =  g(x)E[h(Y) |X-x] 


for  any  functions  g  and  h  such  that  E|g(X)h(Y)(  <  <». 
Since  (X^,Y^),  i  =  l,...,n  are  i.i.d.,  we  have 

En^(x)  =  EY<Sn(x-X) 


=  /EfY6n(x-X) |X=uJ f(u)du 

by  (2.3.1), 

=  /  6n(x-u)E[Y|X-ujf(u)du 


by  (2.3.2), 


/  6n(x-u)m(u)f(u)du  . 
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Now,  by  assumption,  m(u)f(u)  is  continuous  at  u  =  x.  Thus  (i)  will 
follow  from  Lemma  (2.1.3)  if  we  demonstrate  that  m(u)f(u)  is  an  inte¬ 
rtable  function.  To  verify  this,  note  that,  by  Jensen's  inequality 

|m(x)|  =  |E[Y|X-x]|  s  E[|Y||X-x]  . 

Thus 

/  |m(u)f(u)|du  s  /E[ | Y I  | X=u] f (u)du  =  E|Y|  <  °° 

by  assumption,  and  (i)  follows. 

For  (ii),  note 

E[m*(x)]2  =  (l/n)2h{  l  [Y  6  (x-X-)]2 

i=l  1  n 

♦  y  y  Y.6  (x-X.)Y.fi  (x-X.)} 

1  nv  xJ  j  n'- 

=  (1/n)  /  s(u)f(u)6^(x-u)du 

+  ( (n-l)/n) [/  m(u)f(u)6n(x-u)du]"  , 

the  last  step  following  from  (2.3.1)  and  (2.3.2),  as  used  in  the  proof 
of  (i).  Thus 

Var[mn(x)]  =  (1/n)  /  s(u) f (u)62(x-u)du 

-  (1/n) [/  m(u)f(u)6n(x-u)duj2  , 

and,  since  «n  -*•  ®  and  { 6n/cin )  is  a  6-function  sequence,  we  have 
(n/an)Var[n£(x)]  -*•  s(x)f(x)  , 


as  desired. 


n 
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We  now  use  the  preceding  result  to  demonstrate  the  consistency  of 
the  estimators  and  m  . 

2.3.2  Theorem.  Let  {6^1  be  a  6 -function  sequence  such  that 

y  i 

c*n  =  /  6“(u)du  <  «>  for  each  n  and  c*n  =  o  (n) .  Suppose  EY  <  °°  and  x 
is  a  continuity  point  of  f(u),  m(u)  and  s(u),  and  that  f(x)  >  0.  Then 

(i)  mn(x)  **•  m(x) 

—  P 

(ii)  mn(x)  -«■  m(x)  . 

Proof.  Since  «n/n  ■*  0  by  assumption,  we  have  Var[m*(x)]  -*■  0  by 
Lemma  2.3.1.  Thus 

E[m*(x)  -  m(x)f(x)r  -*■  0 

;md  by  applying  Chebychev's  inequality  as  in  Remark  2.2.2,  we  see  that 

*  P 

mn(x)  -*m(x)t(x).  Since,  by  detinition 

mn(x)  =  m*(x)/f (x)  , 

and  f(x)  >  0,  (ii)  follow.-  immediately. 

For  (i),  write 


where  we  have  suppressed  the  argument  x  for  convenience.  Now  mn  -*•  mf 
by  (ii),  and  since  fn(x)  L  f(x)  >  0,  we  have 


p 

-  0  . 


mn  -  mf 


Simi larly , 


f  -  f  p 
n  1 


and  (i)  follows. 

□ 

"c  have,  then,  that  both  the  estimators  %  and  5,  are  weakly  con¬ 
sistent  estimators  of  m  a,  continuity  points  of  s,  „  and  f.  The  follow- 

iny  corollary  specializes  the  preceding  theorem  to  kernel  type  6-function 
sequences . 

2.3.3  CoroHarj,.  Suppose  that  («„(*)>  -  le^KCx/e,,)  1  is  a  6-function 
sequence  of  kernel  type  with  „c„  -  -  as  „  -  .  and  /  K2(u)du  <  Assume 

the  other  conditions  of  Theorem  2.3.2  are  satisfied.  Then  the  conclusions 
ot  Theorem  2.3.2  hold. 

— •  *  nceU  onl>'  verif>’  that  an/n  -  0.  By  definition 
“n  '  I  {n(u)du  *  V  /  h-’(x/cn)dx  -  e->  /  K2(u)du  . 


V  ■  (ntn>  1  I  f’luldu  -  0 

by  assumption. 

□ 

«e  have  so  far  been  assuming  that  the  density  f  of  X  is  continuous 
and  positive  at  points  where  we  wish  to  estimate  the  regression  function. 
An  important  case  where  these  assumptions  may  not  hold  is  when  X  is  a 
ounded  random  v.u  table,  i.e.  when  f  has  bounded  support,  and  we  desire  an 
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estimate  at  a  boundary  point  of  the  support  of  f.  We  have  the  following 

result,  which  demonst rates  that  if  m(x+)  =  m(x) ,  m^xl  is  consistent 

but  m  (x)  is  not . 
n 

2.3.4  Theorem.  Let  (6  }  be  a  6 -function  sequence  such  that  a  <  ® 
-  n  n 

2 

for  each  n  and  un/n  -*■  0.  Suppose  i-Y4-  <  ®  and  x  is  a  point  such  that 
f,  m  and  s  have  left  and  right  hanu  limits  at  x  and  f(x+)  =  f(x)  >  0, 
f(x-)  =  0.  Then 

(i)  mn(x)  -*■  m(x+)/2 

(ii)  rn^x)  "*■  ro(x+)  . 

Proof.  By  Lemma  2.1.7  it  follows  that 

Efn(x)  -  f(x)/2  , 

hm^x)  -*■  m(x+)f(x)/2  . 

By  an  argument  similar  to  the  one  used  in  the  proof  of  Lemna  2.3.1,  it 
follows  that 

tn(x)  -  f(x)/2  , 
m^(x)  £  m(x+)f(x)/2  . 

Since 

%(x)  55  m* (x)/f (x)  , 

(i)  follows.  Since 

m^x)  =  m*(x)/fn(x)  , 

(ii)  follows  by  an  argument  similar  to  the  one  used  in  Theorem  2.3.2. 


□ 
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This  theorem  demonstrates  that  the  estimator  displays  an 

"end  effect"  at  the  boundaries  of  the  range  of  X  which  does  not 

display.  As  we  shall  see  in  Chapter  4,  this  end  effect  represents  a 

possible  disadvantage  for  mn>  depending  on  how  m  is  defined  at  the 

boundaries  of  its  support.  We  now  turn  our  attention  to  asymptotic 

distributional  properties  of  m  and  m  . 

11  n  n 

2 . 4  Asymptotic  Distribution  of  mn . 

Nadaraya  (1964)  and  Schuster  (1972)  have  considered  the  asymptotic 

normality  of  the  estimator  in  terms  of  kernel  type  6-function  sequences. 

2 

Nadaroya  states  that,  if  Y  is  a  bounded  random  variable  and  ne^  -*■  » 

then  (nc  )  2(mn(x)  -  lin^x))  has  an  asymptotically  normal  distribution 

2 

with  zero  mean  and  variance  s(x)/  K  (u)du/f(x),  where 
s(x)  =  E[Y2 | X=x]  . 

Schuster  (1972)  points  out  that  this  expression  for  the  variance  is  in¬ 
correct  and  presents  a  result  with  the  correct  variance  which  at  the  same 
time  removes  the  restriction  that  Y  be  bounded  and  centers  at  m(x)  instead 
of  Jim  (x) .  We  state  Schuster's  result  here  for  comparison  with  a  new 
result  which  represents,  in  some  respects,  an  improvement  over  Schuster’s. 

2.4.1  Theorem.  Let  (cn^K(x/en)}  be  a  6-function  sequence  satisfying  t he 
condition: 

(i)  K(u)  and  uK(u)  are  bounded, 

(ii)  /  uK(u)du  =  0  ,  /  u2K(u)du  <  00  , 


i 


I  i  i  i 1  in 

n 


*  rt' 


S 

,  in  »  0  as  n  ■*  »  . 
n 

Suppose  \j  ,\ ,, . . . , :y  are  distinct  points  with  f(,x.)  s  0,  i  -  l,...,l\ 
lot  win)  -  m(u)f(u)  aiul  assume  f  ,  w' ,  s',  f",w"  o\ist  and  a  tv  bounded, 
and  that  I  V '  v  .»<,  Then 

in.  n)  '(n^Cxj)  -m(Xj) , . . .  .nyx^)  m(x  1 1 ' 

converses  in  distribution  to  a  multivariate  normal  random  vector  with 
,*ero  mean  vector  and  diagonal  covariance  matrix  with  i  th  diagonal 
element  given  bv 

o^l x.)/  K^lulvlu/t  lx- 1 

where 

o^lu)  -  sin)  rn-iuV  . 

Schuster  pnnes  this  theorem  hv  using  the  herrv  Tsseen  theorem  to 

show  the  joint  asymptotic  normality  of  the  numerator  and  denominator  ot 

iiij  .  An  application  ot  tlie  Mann  Wold  theorem  (Billingsley  (PH'S))  then 

yields  the  vies i red  result.  As  we  shall  see,  it  is  not  necessary  to  con 

sider  the  joint  distribution  ot  the  numerator  and  denominator  ot  m^  in 

order  to  estahl ish  asymptot ic  normality.  Schuster's  proof  can  thus  be 

simplified.  Also,  by  using  the  l.indeberg  leller  central  limit  theorem, 

instead  of  the  herrv  fs seen  theorem,  we  will  be  able  to  tequire  the 

A  function  sequence  to  satisfy  a  less  restrictive  condition,  namely  that 

ni  ,  »  instead  of  m  '  *■  «*. 
n  n 

We  now  present  the  new  asymptotic  normality  result  for  .  The  most 
important  difference  between  this  theorem  (when  stated  in  terms  ot  Kernel 
type  A  sequences)  and  Theorem  2.4.1  is  that  it  only  ivquitvs  tu  *  >" 


instead  ot"  n>  ' 
n 
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*  ®.  Also,  it  applies  to  general  6-function  sequences. 
There  are  minor  differences  in  the  other  conditions  which  will  he  evi 
dent  in  the  statement  of  the  theorem.  IVe  first  state  the  main  theorem, 
and  then  prove  a  preliminary  lenma  before  returning  to  the  proof  of  the 
theorem.  We  then  specialize  to  kernel  type  6-function  sequences. 

T.4.J  Theorem.  Let  16()}  he  a  sequence  of  6 -functions  such  that 

2+n 

Tn  “  /  1 5  Cu)  |  du  <  <"  for  each  n  , 
for  sane  n  M)  , 

an  ”  /  6^(u)du  =  0(n)  as  n  -*■ 


v„  -  as  n 


Suppose  I: | Y |  “  v  ;uid  the  distinct  points  Xj  ,x are  continuity 

points  of  each  of  the  functions  f  (x) ,  m(xl ,  s(\l  *  1:. | Y*- 1 X*x )  and 
r.||Y|“  |  X-x] ,  and  that  f(Xj)  >  0,  i  -  l,...,P.  Then 


^n1*^’  “n . VV1 

converges  in  distribution  to  a  multivariate  normal  random  vector  with 
zero  moan  vector  ;md  identity  covariance  matrix,  where 
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and 


«n<x) 


2  2 
o^x]  *  s{x)  -  m^lx)  . 


II 


Since  *  ni  / f n  is  a  ratio  of  sums  of  random  variables,  a  direct 
central  limit  argument  is  not  possible.  However,  note  that 

"\i  *n  '  l\/f  '  (fn/f)*nl(f/fn) 


P 

and  f/t  -*•  1,  so  that  m^  -  gn  will  have  the  same  asymptotic  distribution 
as  the  term  within  squire  brackets  above.  The  term  within  square  brackets 
is  a  sum  of  random  variables,  and  thus  standard  arguments  may  be  used  to 
establish  its  asymptotic  normality.  This  is  the  outline  which  the 
proof  of  Theorem  2.4.2  will  follow,  although  the  notation  will  be  more 
complicated  since  the  proof  will  be  in  a  multivariate  setting. 

The  following  lemma  establishes  the  asymptotic  variance  and  covariance 

°r  m*/r  -  (1',/niv 

2.4.3  Lenina.  Let  {61  be  a  6 -function  sequence  such  that 

.  i 

on  B  /  6"(u)du  «•  »  lor  each  n.  Suppose  x  f  y  are  continuity  points  ot 

■J 

f,  m,  and  s,  ;uui  that  f(x)  >  0,  f(y)  >  0,  and  KVfc  <  «*.  Define 


and 


Then 


iyz)  *  f{2)/fn(z) 

llnUl  " 

(i)  (n/o^lVarfH^x) )  -*■  o"(x)/f(x) 


and 
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(ii)  n  Cov[iyx),  iyy)]  >  0  as  n  -  » 

Proof .  By  definition, 

Vx)  “  (nf(x))'1  [(Y.-gn(x))6n(x-Xi) 
and 

EHn(x)  =  I\(x)/f(x)  -  gn(x)Efn(x)/f(x) 
-  0 

since 

8nW  =  Bi£(x)/Efn(x)  , 
and  we  thus  have 


Now 


s  ince 


Varfiyx)  ]  =  EH^(x) 

=  (nf(x))‘2E{_n(Yrgn(x)]6n(x-Xi)]2 
+  nf(Yi-Sn(x))«5n(x-Xi)][(Yj-Sn(x))6n(x-Xj)]} 

E[(Y*Kn(x))6nCx-X)]  =  0 

Em*(x)  =  EY6n(x-X)  , 

Efn(x)  =  E«n(x-X)  . 


(n/yVarffyx)] 

=  a^Vw^EKY-yx^fr-X)]2 


Hence 
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=  a’1(f(x))'2{  /s (u) f (u) 6 2 (x-u) du 

-  2^Cx)  /  m (u)  f (u) 6 2 (x  *u) du 

+  g“(x)  /  f (u) 62(x-u)du} 

-  (f(x))'2{s(x)f(x)  -  m2(x)f (x) } 

=  o“(x)/f(x)  , 

2 

since  {6*(u)/«n}  is  a  6-function  sequence  by  Lenina  2.1.4  and  gn(x)  -*•  mix). 
For  (ii),  note 

nCovp^x),  ll^y)]  =  nKHntxiHnCy) 

*  (nf(x)F(y))_1E  ll  [ (Y  -a  (x)]6  (x-X-)]  • 

i.j-1  1 

UYj-Kniy))6n(y-Xjl] 

»  (nflx)f(yl)'1  ^F.[(Yi-giilx))6n(x-Xi)]  • 
[(V^n^^n^i51 

*  (flx)f(y))'1  /  6n(x-u)6n(y-u)qn(u)du 

where 

qn(u)  =  ftuHsfuJ-mtuKg^xJ+g^y))  +  Rn(x)gn(y)  ] 
is  continuous  at  u  =  x  and  y  ■  y  by  assumption.  Thus 

n  Cov[l^(x),  ^(y)]  -  0 


by  Lemma  2.1.6,  and  (i)  is  true. 


[1 
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We  now  return  to  the  proof  of  Theorem  2.4.2.  By  the  Cramer-Wold 
device  (e.g.,  Billingsley  (1968)),  it  suffices  to  show 


j/kW  -  N<°-  ■ 


or,  equivalently, 


j,tkfn,n(xk)  ‘  *n<xk)J  , 

- 5 - n - ^  Nfo.l)  , 


for  any  real  numbers  t1,t2,...,t  .  Write 
Vx)  *  ^(x)  =  Hn(x)Rn(x) 

where  1^  and  are  as  defined  in  Lemma  2.4.3.  Since  fn(x^)  5  f (xfc) , 
k  =  l,...,p,  it  follows  that  Rn(xk)  Si,  k  =  l,...,p,  and  it  thus  suffices 
to  show  that 

XvW _ 

{(an/n)J1tk(°2(xk)/fl(xk)}J5 

-  N(0,1) . 

Now 

Vn  =  /  I 6n(u) I 2^ndu  <  ~ 

for  each  n  implies 

an  =  /  6n(u)du  < 


00 
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for  each  n,  since  /  6“(u)du  <  «  for  any  finite  X  >  0  bv  Holder's 
| u | <X  n 

inequality,  and  by  Cl  and  C3  of  Definition  2.1.1.  Thus  we  have,  by 
Lenina  2.4.3, 

Var<jJtkW> 

*  n  Vj  Cov[Mn(xk),  Vxj>l 
~  iyntj  t2o2,Xk)/f(xk) 
as  n  -*■  °° .  Hence  it  suffices  to  prove 


?  «k' n^xk> 


k=l 


{Var[ 


NCO.l)  . 


i*l 


k  nv  k 


lx,.)]) 


Since,  by  definition, 


Hn(x)  =  (nf(x))'1  l  (Y.-c  (x))6  (x-X.)  , 
i“l 


we  may  write 


v  .  V 
n  i=l  n>i 

where  the  i.i.d.  random  variables  V  i  «  l,...,n,  are  defined  bv 

n ,  1 

v„.i  •  "Vj/W>OYVxk»Vxk-V 


where 
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’n  '  Varl.  I 


(tk/f(xk))(Y-Sn(xk))6n(x-X)}  . 


It  then  follows  from  the  Lindcberg-Feller  central  limit  theorem  that 
i t ,  for  some  n  >  o , 


**|vn,il“+n  -  0 


Vn  *  N(0,1)  as  n  - 


Now  by  applying  the  c{.  inequality  of  U*ve  (1%3)  repeatedly  we  have 


nl.|V  ,|-+ri 
1  n ,  1 1 


k*‘  "  °nflxk> 


s  V  .  („JE|VV*k  •  VI2*” 

k-1  k  '  nnyio‘'n(f(xk))-*n' 


.  I;IWVV»1)I2*'1  | 


"n/‘ofnCf(xJ?^  I 


■  ?« 


■ki,Cltln)tIk.n  *Jk.nl  • 

Whorf  ok(n)  depends  only  on  k  and  n  iind  the  constants  1 ; . t.  . 

is  easily  seen  that 

'  "  V',r|j|'k,ln‘xk)l  ~  %  J1,k°2(xkl/f(xk’  • 

the  last  step  following  by  earlier  calculations.  I'unher. 
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l:|Y6n(x-X)|2+n 


=  ^  1 5n(x_u)  I  ‘,+nEf|Y|  2+n|X»u]f(u)du 

=  Yn  /  C(x'u)Et|Y|2+n|X-u]f(u)du 

**  2+n 

where  {<5n  }  =  { 1 6n |  *"  /yn}  is  a  6-function  sequence  by  Lemma  2.1.4. 
Thus,  for  k  =  1,2 . p,  we  have 


Ynn[|Y|2+n|X«xJ 


k,n 


nn/2aW2fUk)  (  j\2a2(x.  )/f(x. ) ]  Un/2 


+  0  as  n  •*  » 


since 


by  assumption.  .Similar  calculations  yield 


k,n 


\X(xk>i 


2*n 


nn/24*"/2(f(V):*'’[j1ti‘’2(xk)/fO<k)l1*n/2 


0  as  n  4  ®  ,  and  the  proof  is  complete. 


□ 


We  now  ,nive  a  version  of  Theorem  2.4.2  for  kernel-type  3 -function 
sequences,  which  may  be  compared  with  Schuster’s  Theorem  2.4.1. 

2.4.4  Theorem.  Suppose  t6n(x")}  =  (en^K(x/en) )  is  a  6-function  sequence 
ot  kernel  type  satisfying 

(i)  / 1  K(u)  |  ^^ndu  <  «>  for  some  n  >  0 


M 


(ii)  /  uK(u)du  =  0  ,  J  u2K(u)du  <  » 

(iii)  nc^  -*•  00  ,  nen  **•  0  as  n  00  • 

Suppose  m(x)  and  f(x)  have  bounded,  continuous  1st  and  2nd  derivatives, 
F.|Y|2"n  <  -  ,  the  distinct  points  x^ xp  are  continuity  points  of 

s(x)  and  E[ j Y| 2+n|X=x]  and  f(xk)  >  0,  k  =  1 . P-  Then 

(Z' (xL) , . . . ,Z' (x  }))  converges  in  distribution  to  a  multivariate  normal 

random  vector  with  zero  mean  vector  and  identity  covariance  matrix,  where 

(nen)^(mn(x)  -  m(x)) 

“n<'X)  {o2(x)  /  K2(u)du/f(x)}^ 

Proof.  We  first  verify  that  this  6 -function  sequence  satisfies  the 
conditions  of  Theorem  2.4.2.  Now 

an  =  /  62(u)du  =  en1  J  *2(u)du  <  « 

for  each  n  since  en  j*  0,  /  K2(u)du  <  ».  Further, 

an/n  =  (nen)  1f  K2(u)du  +  0 

since  nc  00  by  assumption*  Similarly > 

Yn  -  /  |fin(u)|2+ndu  =  (l/en)1+n  /  |K(u)  1 2+r>du  <  00 

for  each  n,  and 

Yn/nn/2a*+n/2  *  (ncn)  _n/2  "*■  0  aS  n  “ 

by  assunption.  Thus  this  type  of  6-function  sequence  satisfies  the 
requirements  of  Theorem  2.4.2,  and  since  the  remaining  regularity  condi¬ 
tions  of  Theorem  2.4.2  are  clearly  satisfied  under  the  present  assumptions. 


2.5  Asymptotic  Distribution  of  RTn. 

_  * 

It  is  evident  that,  since  =  mn/f  is  a  sum  of  independent  random 
variables,  we  nay  apply  the  Lindeberg-Feller  central  limit  theorem  in 
much  the  same  way  as  we  did  in  Theorem  2.4.2  to  establish  the  asymptotic 
normality  of  (m  (x)  -  Hm  (x))/Var[m  ].  We  established  in  (ii)  of  Lenina 

2.3.1  that 

Var[m*(x) ]  -  (an/n)s(x)f (x) 
for  appropriate  points  x.  Hence  we  have 
Var[mn]  ~  (an/n)s(x)/f (x)  . 

We  therefore  have  the  following  theorems ,  which  we  state  without  proof, 
since  the  proofs  follow  those  of  Theorans  2.4.2  and  2.4.4  very  closely. 

Hie  first  theorem  concerns  the  asymptotic  normality  of  m  for  general 
6-function  sequences;  the  second  for  kernel  type  6-sequences. 

2.5.1  Theorem.  Under  the  conditions  of  Theoran  2.4.2, 

(Wn(xi) * • • • .Wn(xp^)  converges  in  distribution  to  a  multivariate  normal 
random  vector  with  zero  mean  vector  and  identify  covariance  matrix,  where 

m  fx)  -  bin  (x) 

Wn(x)  =  ~ 2 " - 1-  • 

{ (an/n)s(x)/f(x)} 

2.5.2  Theorem.  Under  the  conditions  of  Theorem  2.4.4,  (W^(xj) • • • >W^(Xp)) 
cor  erges  in  distribution  to  a  multivariate  normal  random  vector  with  zero 
mean  and  identity  covariance  matrix,  where 

(ne^OTy^x)  -  m(x)) 

{ s ( x )  /  K2(u)du/f(x)}lj 


The  mean  integrated  square  error  (MISE)  Jn  of  an  estimator 

fn(x)  =  n'1i^16n(x'Xi) 
of  a  density  f  is  defined  as 

Jn  =  n  /  (fn(x)  ’ 

where  5n  and  f  are  assumed  to  be  square  integrable.  Watson  and  Leadbetter 

(1963)  show  that  Jn  is  minimized  for  each  n  if  6n  is  chosen  to  have  a 

Fourier  transform  expressible  as 

n 

kf(t)|2 

(t)  =  - - - j 

n  (1/n)  ♦  ((n-l)/n)|«f(t)r 

where  <p^  is  the  Fourier  tnmsform  of  f.  (Fourier  transforms  of  square 
integrable  functions  have  the  usual  interpretation  here.)  For  the 
regression  estimation  problem,  Watson  (1964)  considers  the  error  criterion 
.r  defined  by 

Jn  =  E  /  I  l  6n(x‘xi)m(x)  -  l  Y.6  (x-X-)]2dx 
i=l  i=l 

where  appropriate  assumptions  are  made  on  6n  and  m  to  insure  the  finite¬ 
ness  of  the  integral.  Watson  states  that  is  minimized  for  each  n  if 
<5n  *s  c^osen  so  :ls  to  have  Fourier  transform 


^6  ^  ~  ~ it  2  - - 

°n  n  1EY^  +  ((n-l)/n)  |d>fm(t) 


where  is  the  Fourier  transform  of  fm. 
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We  assume  here  that  6^  and  fm  are  square  integrable  functions  and 

2 

that  EY  <  We  define  the  error  criterion  I  by 


In  =  E  /  (mn(x)  -  f  (x)m(x))  (lx 


where  m  is  the  numerator  of  m  .  We  will  show  here  that  I  is  also 
n  n  n 

minimized  for  each  n  by  choosing  6n  to  have  Fourier  transform  given  by 

* 

.  defined  above.  Note  that  I  may  be  interpreted  as  the  MISE  of  the 
n  _  11 

numerator  of  mn  or  n^,  disregarding  the  denominator. 

By  the  definition  of  I  and  Parseval's  formula,  we  have 
(2.6.1)  In  -  E  /  (m* (x)  -  f(x)m(x))2dx 

=  (2tt)  _1E  /  |  <t>  *(t)  -  <J>fm(t)|2dt 

"Vi 

* 

where  4>  *  is  the  Fourier  transform  of  m  ,  so  that  I  may  be  minimized 
m  n  n 

n 

by  minimizing  the  extreme  right  hand  side  of  expression  (2.6.1)  above. 
Now,  by  Fubini's  theorem  for  positive  functions, 


E  /  |<J>  *(t)  -  *fm(t)|2dt 


m 


=  /  E|*  *(t)  -  *fm(t)|2dt  , 


so  that  I  may  be  minimized  by  minimizing 
E|4>  *(t)  -  4>fmCt)  |  2 


m 


=  E{  |  <J>  *(t)|2  +  |4>ftll(t)|2 


m 


*(t)Ofmlt)  +  4>  *(t)^fm(t)l} 


m 


m 


fm' 


(2.6.2) 
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for  each  t,  where  g  denotes  the  conjugate  of  the  complex  function  g. 
Note  that  since  fm  is  an  integrable  function, 


(2.0.3) 


Further, 


(2.6.4) 


so  that 


(2.6.5) 


(t)  =  /  cltu  f (u)m(u)du 


/  fn'1  l  Y  6  (u-X.)]eitUdu 
i  =  l  J  n  J 


n  itX. 


=  n'1  l  Y.e  -'  /  6  (u)eltudu 
j  =  l  J  n 

n  itX. 

=  <t>6  U)[n  I  Y.e  J]  , 

n  j-1  3 


E|*  *(t)| 


■,  ,  n  itX.  ■» 

-T.’ln'1  Y  V  r.  Jl2 


■  l*6  (t)rE|n'A  I  Y  e  3  \£ 
n  j-1  J 

i  t  n  i  tX . 

*  l»6  U)|-n'2E  H  Y.Y ,e  Je 
n  j,k-l  3  k 


itX.  -itX. 

Jr.  * 


l*6  (t)|2n'2(E  l  Yf 
n  j  - 1  J 


itX.  -itX, 

.  k. 


-  K)- 


and  thus 


F.YeltX  -  /  m(u)f(u)eluldu 


4>rJt)  . 
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(2.6.5) ' 


K|4>  *Ct)  | 

"V 


l*6  (t)  |  “[  (l/n)F,Y2  +  ((n-l)/n)|*f  (t)|2] 


Finally,  from  (2.6. 3)  and  (2.6. 4) ,  wc  have 


(2.6.6) 


E[*fm(t)*  *(t)  ♦  <*>fm(t)4»  *(t)] 

m  in 

n  n 


*  mmn’1 1  Ye  J  | 

n  j  =  l  J 

_  ,  n  itX. 

+  (OEfn'1  l  V  ''I 

n  j  =  1  • 


n  -itX 


=  l4>fi„(0|2[*6  (O  +  (01 

n  n 


2  Re[4>6  (t)]|<ffm(t)| 


Combining  (2.6.3),  (2.6.5)  and  (2.6.6)  yields 


(2.6.7) 


E|*  *(t)  -  4>fm(t)r 


m 


=  1^(0  |z  -  2  Re[4>6  (t)]kf  (t)|2 

n 


*  |*a  (01  [(l/n)ET  +  ((n-l)/n)l4>.-  (t)H 


[(l/n)BY2  *  <(n-l)/nmf  (t)|2| 


*  m  '♦fn.101 

(t)  -  - j 

' \f 


n  (l/n)EY‘  M(n-l)/n)|4,fm(t)r 
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.  l»f„Ct)|2tBY2  -  lyt)|2] 

EYJ  *  (n-l)|*fnl(t)|2 

the  last  equality  following  by  completing  the  square  and  rearranging 
terms.  Now 

1 4>fm( t)  |  =  |  /  m(u)f(u)eltudu| 
s  /  |m(u)|f(u)du 

s  /  E[|Y|]X-u]f(u)du  =  E | Y |  , 

so  that 

M>fm(t)|2  *  CE|Y|)2  s  EY2  , 

for  all  t,  and  hence  (2.6.7)  is  minimized  for  each  t  by  choosing 
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3.  ASYMPTOTIC  PROPERTIES  OF  MAXIMUM  ABSOLUTE  DEVIATION 


3.1  Prel iminaries 

Since  our  goal  in  many  cases  is  the  estimation  of  the  regression 
function  over  the  entire  real  line,  or  seme  subset  of  the  real  line,  it 
is  natural  to  investigate  the  behavior  of  our  estimators  under  some 
global  error  criterion.  An  attempt  at  this  direction  was  made  in 
Section  2.6,  where  we  considered  mean  integrated  square  error.  This  was 
not  entirely  satisfactory,  however,  since  we  were  only  able  to  determine 
the  6-function  sequence  which  minimized  the  MISE  of  the  numerator  of  the 
estimators  in  question,  disregarding  the  denominator.  In  this  chapter, 
we  consider  a  different  global  error  criterion,  the  maximum  absolute 
deviation,  defined  as  suplm  (x)  -  m(x) I  where  I  is  a  closed,  bounded 

Xe  I  n 

interval  of  the  real  line,  which  we  will  take  without  loss  of  generality 
to  be  [0,1].  We  shall  mainly  be  concerned  here  with  conditions  under 
which  the  maximum  absolute  deviation  converges  to  zero  in  probability 
(in  this  case  we  say  that  the  estimator  in  question  is  uniformly  consis¬ 
tent  over  I).  We  will  also  be  able  to  find  a  large  sample  confidence 
bound  for  the  regression  function,  based  on  the  estimator  m^. 

Our  method  of  analysis  will  follow  the  one  briefly  outlined  below 
used  by  Bickel  and  Rosenblatt  (1973)  and  Rosenblatt  (1976)  for  probability 
density  estimators.  For  a  density  function  estimator 

*n(u)  =  l  ^((u-xp/e  )  , 

i=l 

the  deviation  about  the  mean  f  (u)  -  Ef  (u),  normalized  so  as  to  have 

n  n 
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non-zero  asymptotic  standard  deviation,  may  he  written  as 

(3.1.1)  (ncn^(fn(u)  ~  Lfn(u)) 

(f(u)]lj 

-  [f(u)enr>s  /  K((u-s)/en)dZn(s) 

=  Yn(u)  ,  say, 

where  is  the  empirical  process  defined  by 
Zn(s)  =  nVn(s)  '  F(s))  * 

where  I;n  is  the  empirical  distribution  function  (HDF)  of 

i  =  1 . n,  and  F  is  the  cumulative  distribution  function  of  Xj. 

Komlds,  Major  ;ind  Tusnddy  (1975)  have  shown  that  a  sequence  of  Brownian 
bridges  (Bn)  on  [0,1]  may  be  constructed  such  that 

(3.1.2)  sup  | Z  (u)  -  B  (F(u) )  |  =  (Kn'Sog  n) 

-oo<u<oo  11  11 

a.s.  This  fact  is  exploited,  using  integration  by  parts  in  (3.1.1),  to 
show  that 

(log  n)1*  sup  |  Y  (u)  | 

Osusl 

■  (1°8  n)*1  sup  |Y  (u)  |  +  o  (l) 

Osusl  i,n  ‘ 

where  Yj  n  is  the  stochastic  process  obtained  by  replacing  Zn(s)  with 
Bn(F(s))  in  the  defining  expression  for  Y  .  Further  stages  of  approx¬ 
imation  finally  yield 
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(3.1.3)  (log  n)lj  sup  I Y  (u)  I 

Osusl  n 

=  (log  n)^  sup  |Y  (u)|  ♦  o  (l) 

Osusl  ^,n  1 

where  Y-,  n  is  the  Gaussian  process  on  (0,1)  defined  by 
Y2>n(u)  =  e’*4  /  K((u-s)/cn)dW(s) 

where  W  is  a  Wiener  process  on  R.  The  asymptotic  distribution  of 

(log  n)  sup  IY?  (u) |  with  proper  centering  constants,  is  determined, 
Osusl  ^,n 

L 

and,  in  light  of  (3.1.3),  (log  n)^  sup  |Y  (u)|  has  the  same  asymptotic 

Osusl 

distribution. 

We  will  employ  this  method  to  determine  the  asymptotic  distribution 

* 

of  the  maximum  absolute  deviation  of  the  numerator  of  the  estimators 
and  m^,  properly  normalized  and  centered.  Algebraic  manipulation  and 
elementary  analysis  will  then  yield  uniform  consistency  of  the  estimators, 
with  an  associated  rate  of  convergence.  Since  the  denominator  of  m^ 
is  non-stochastic,  an  asymptotic  confidence  band  for  m,  based  on  m^  , 
may  also  be  specified. 

In  the  forthcoming  development,  we  will  need  to  use  integrals  of 
the  form 

(3.1.4)  Yn(t)  =  //  yk(^)dW(T(x,y))  , 

n 

where  T:  IR“  f 0 , 1  ] “  is  the  transformation  defined  by 


T(x,y)  *  (FX| y(x,y) ,  Fy(y))  , 

2 

and  W(*,*)  is  the  Wiener  process  on  [0,1]  .  In  this  section,  wo  will 
give  conditions  for  the  existence  of  (3.1.4)  and  prove  some  useful 
properties. 
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2 

If  H(s,t)  is  a  real,  measurable  function  on  [0,1]  ,  then  it  is 
well-known  that  the  L7  integral 

If  U(s,t)dW(s,t) 

exists  if 

/ /  H“(s,t)dsdt  <  «> 

(see  Masoni  (1968),  Chap.  5). 

Suppose  that  f(x,y)  >  0  for  all  real  x  and  y  so  that  T  is  onc-to- 
one  and  hence  T  is  a  well-defined  function  on  [0,1]  to  1R  .  Denote, 
for  fixed  n  and  t 

C>t(x,y)  =  yK((t-x)/en)  . 

Then,  by  Theorem  5.19  of  Masani  (1968),  we  have 

(3.1.5)  II  yK((t-x)/ en)dW(T(x,y)) 

IR2 

-  II  G* (T_1(s,u))dW(s,u) 

2  1 
[0,1] ^ 

in  the  sense  that  if  either  integral  exists,  then  so  does  the  other 
and  they  are  equal.  By  the  previous  remark,  the  integral  on  the  right 
hand  side  of  (3.1.5)  exists  if 

II  1(s(u))dsdu  <  00  • 

2  1 

io,ir 

Now 

(3.1.6)  If  ?  G2(T  *(s,u))dsdu 
[0,1]2  1 

*  //  G2(x,y)|J(x,y)|dxdy 

m  2  t 


4t> 

where  J(x,y)  is  the  Jacobian  of  T  (see,  e,g.  Buck  (1965),  Sec.  6,1, 

Thm.  4),  if  |J(x,y) |  >  0  for  all  real  x  and  y  and  Gt(x,y),  f(x,y)  and 
f(y)  are  continuous.  By  definition, 

lx  1-X|Y(x'y^  3y  FX|Y(xly) 

J(x,y)  « 

lx  Fvw  4  vy) 

=  fx|Y(x|y)fY(y) 

=  f (x,y)  >  0 

by  assumption,  using  the  obvious  notation  for  conditional  and  marginal 
densities.  Thus 

//  G2(T ~^(s,u)  )dsdu 
[O.ll2 

=  //  y“K((t-x)/e  )f(x,y)dxdy 

2  n 
1RZ 

=  BY2K2((t-X)/en)  <  « 

2 

if,  e.g. ,  EY  <  «>  and  X  is  bounded.  We  note  that  the  above  development 
holds  if,  instead  of  having  f(x,y)  >  0  for  all  real  x  and  y,  we  have 
f(x,y)  >  0  for  x  and  y  in  some  rectangle  of  IR  “,  and  the  range  of 
integration  is  appropriately  adjusted.  We  will  henceforth  assume  this 
to  be  true  without  coninent. 

We  will  now  give  properties  of  the  integral  (3.1.4)  which  will  be 
useful  in  the  future  development .  We  will  show 

HYn(t)  =  0  , 


(3.1.7) 
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(3.1.8)  HYn(tl)Yn(t2) 

=  //  y“K((t1-x)/cn)K((t2-x)/en)f(x,y)dxdy  , 

tor  ^  t,.  In  view  of  (3.1.5)  and  the  definition  of  the  stochastic 
integral ,  (3.1.7)  follows.  For  (3.1.8),  we  note,  by  (3.1.5)  and  (5.2) 
of  Masani  (1968) 

EWVV 

=  II  (T  ^s.u))!!  (T  1(s,u))dsdu 
1  C2 

=  II  y2K((t1-x)/en)K((t2-x)/en)f(x,y)dxdy 

as  in  (3.1.8). 

We  finally  note  to  close  this  section  that,  since  W  is  a  Gaussian 
process  on  [0,1]  and  since  ^n(t)  is  an  L-,  limit  of  linear  combinati  ons 
°f  w(*»‘),  we  have  that  Yfi(t)  is  itself  a  (one -parameter)  Gaussian 

stochastic  process  tor  each  n,  with  mean  given  by  (3.1.7)  and  covariance 
function  given  by  (3.1.8). 

3.2  Maximum  Absolute  Deviation  of  m_ 

- — - — - — - n 

It  is  convenient  to  introduce  certain  assumptions  at  this  point  which 
will  be  in  force  in  our  main  theorem.  Let  f(x,y)  denote  the  joint  densitv 
of  (X , Y) ,  fy (y)  the  marginal  density  of  Y,  and  let  {an>  be  a  real  sequence 
with  an  ■+  as  n  -*■  <®.  We  make  the  following  assumptions: 

(Al)  (log  n J  y^fy(v)dy  s  c 

lyUa 
7  n 

for  all  n  and  some  constant  c, 
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(A2)  a^  Si  *^(log  n)“  •*  0  as  n  -*■  ®  , 

,  7 

(A3)  (log  n)  sup  J  y“f(x,y)dy  -►  0  as  n  ■*  »  . 

Osxsl  |vha 
'  n 

(A4)  There  exists  a  constant  n  >  0  such  that 

an 

«hW  =  /  y2f(x,y)dy 
an 

satisfies 

^(x)  >  n  Vxc  (0,1]  and  some  n  , 

and  g^  has  a  continuous  1st  derivative  on  some  interval  [-A,A]. 
further,  the  functions 

s(x)f(x)  =  /  y”f (x,y)dy  , 

E[|Y| |x»x]f(x)  =  /  |y| f (x,y)dy 

are  uniformly  bounded. 

If  Y  is  a  hounded  random  variable,  then  clearly  any  seqvicnce  (a^ 
with  an  -*•  «  satisfies  assumptions  A1  and  A3.  If  the  marginal  distribu- 

-  K 

tion  of  Y  is  normal  and  e  =  n  as  in  Theorem  3.2.1,  then  it  is  readily 
checked  that  {an>  =  {log  n}  satisfies  A1  and  A2. 

*  *  _L  g 

We  normalize  ( t )  -  En^lt)  by  (nt'n)  [s(t)  fit)]*,  which  is  pro¬ 
portional  to  its  asymptotic  standard  deviation,  thus  defining  the 
following  stochastic  process  on  10,1): 

v  It)  •  (nc",'’|m<'U)  ■  “v101 

nlX - t -  . 


(3.2.1) 


fs(t)f(t)l 
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Then  we  have  the  following  theorem. 


3.2.1  Theorem.  Suppose  the  kernel  function  K  vanishes  outside  a  finite 
interval  [-A,A]  and  is  absolutely  continuous  and  has  a  bounded  derivative 
on  [-A,A]  and  that  the  marginal  density  of  X  is  positive  on  an  interval  con 
taining  [0,1].  Suppose  A1-A4  hold.  Then,  for  en  =  n"6,  0  <  6  <  ^  , 


p{(26  log  n)h 


sup 


lyt) 

0st<l  n 
[*00]** 


-  dn 


<  x 


-2e 


as  n  -*  “>,  where 


A(K)  =  /  K2  (u)  du  , 

c  c1  (K) 

dn  =  (26  log  n)2  +  (26  log  n)  *{log(-^— ) 


if 


+  4[log  6  +  log  log  n]} 


cx(K) 


K2(A)  +  K2(-A) 
■  2ATK) 


>  0  , 


and  otherwise 

J.  -J,  C-(K) 

dn  =  (26  log  n)2  +  (26  log  n)  ^logf-^p)] 

where  _ 

/[K*  (u)pdu 

C2(K)  =  2A(10  * 

The  proof  of  Theorem  3.2.1  is  based  on  Theorems  3.2.2  and  3.3.3, 
which  follow.  Theorem  3.2.2  is  due  to  Bickel  and  Rosenblatt  (1973), 
who  used  it  in  proving  a  result  similar  to  Theorem  3.2.1  for  probability 
density  estimators.  We  will  here  denote  by  /  K(t)dW(t)  the  L2  integral 


of  k  with  respect  to  the  Wiener  process  W  (see  e.g.  Doob  (1953), 

Chap.  IX,  Sec.  2). 

3.2.2  Theorem.  Suppose  K(*)  is  a  kernel  function  which  vanishes  outside 
[-A,A]  and  is  absolutely  continuous  on  [-A,A].  Define  on  [0,1]  the 
stochastic  process 

Zn(t)  =  /  KC^)dW(x) 

tn 

where 

-6 

cn  -  n 

with  0  <  6  <  and  W(x)  is  a  Wiener  process  or.  I-00,00).  Then 
p|  (26  log  n)*5 

as  n  +  »,  where  dn  and  A(K)  are  as  in  Theorem  1. 

Theorem  3.2.3  is  a  special  case  of  Theorem  E  of  Rev£sz  (1976). 

3.2.3  Theorem.  Let  and  X,  be  independent  random  variables,  each 
uniformly  distributed  over  [0,1].  Define  the  empirical  process  of 
(XpXp  by 

Zn(xrx2^  =  n>i[Fn(xl,x2)  '  xlx21 

2 

on  [0,1]  ,  where  Fn  denotes  the  empirical  distribution  function  of 
(XrX2).  Then  one  can  define  a  sequence  {B^}  of  independent  Brownian 
bridges  on  [0, 1 ] 2  such  that 
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(3.2.2)  sup  |Z  (x  ,x?)  -  B  (x  ,x  )| 

Osxj.x^l  11  L  1  n  1  L 

=  0(n  '^(iog  n)^)  a.s. 

Proof  of  Theorem  5.2.1:  For  convenience,  denote  sup  |g(t)|  bv 

Ostsl 

I  Igl  I  and  note  that,  for  any  sequence  of  processes  { (t) }  defined  on 

[0,1], 

(log  n)S[| |Yn| |  -  dn] 

=  (log  n)'2[||Zn||  -  dj 

+  dog  n)li[||Yn||  -  ||Zn||]  . 

Thus,  if  we  show  that 

(log  n^l  |Zn  -  Yn|  |  £  0 

and 

(log  n^UlZjl  -  dn] 
converges  in  law,  then 

(log  n)*s[||Yn||  -  dn] 

also  converges  in  law,  and  has  the  same  limiting  distribution. 

We  will  apply  the  preceding  remark  to  eventually  "approximate" 
the  process  Yn  with  the  process  Zn  of  Theorem  3.2.2,  thus  obtaining 
the  desired  result.  We  will  proceed  through  several  stages  of  such 
approximation,  and  the  details  will  be  given  in  the  sequence  of  lemmas 
which  immediately  follows  the  proof  of  this  theorem. 


j 
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Wo  first  note  that  Y  may  he  written  as 

n  ' 


(3.2.3)  Yn(t)  -  Is(t)f(t)]'Scn,s  //  yK(^)dZn(x,y) 


rtvXj 

n 


where  Zn  is  the  empirical  process  defined  by 


2n(x,y)  -  nVn(x,y)  -  i;(x,y)| 


Now  define  the  following  processes  on  [0,1): 


-i-  -u 


(3.2.4)  Y  (t)  -  [s(t)f(t)]'V‘s  If  yK(^)dZn(x,y) 

n  |y|san  en  n 


(5.2.5)  YUn(t)  -  [*n(t)f(t)]  \n 


1 1  ..v  f  ^  a 


|y 


/  yK(— -r— )dZ  (x,y) 
san  n 


where 


sn(t)  "  EIY“l{|y|Sa 


(5.2.0)  V2  n(t)  =  [s.lOfdJlV  //  (r~)  dB  (T(x ,  y) ) 

-n  n  11  |v|sa  cn  n 

i.  i  n 


t-x. 


where  {B^}  is  a  sequence  of  Brownian  bridges  as  in  Theorem  5.2.5  and 
•>  •> 

T:  IK  “  *■  [0,1)  is  the  transfonnat ion  defined  by 
(5.2.7)  T(x,y)  -  (F'xi y(x|y) ,  Fy(y))  , 

IJ.J.ai  Y,  n(t)  .  |sn(t)f(t)|  yK(I^)dKn(T(x,y)) 


where  (W^)  is  a  sequence  of  independent  Wiener  processes  used  in  con¬ 


st  meting  (Bn)  as 


S3 


Bn(u's)  "  WnCu,s)  -  us  Wn(l,l) 
CRcvcs:  (1970)),  0  s  u,  s  s  1  , 


(3.2.9) 


V4,nU)  "  ,snU)f(t^  /  Isn(x)fCx)lS((^L)dW(x)  , 


(3.2. !°)  Y5>n(t)  =  cnS  /  K(^)dW(x)  , 


where  W  is  a  Wiener  process  on  (-«>,«>), 


We  have,  by  Lemma  3.2.4, 


*  |Yn  '  Y0,,J  I  =  °pUlog  n)'S)  , 

where  <?p(an)  refers  to  a  sequence  of  random  variables  ^  such  that 
An/an  *  0  in  Probability.  Lemma  3.2.8  gives 

l|Y0,n  ■  Yl,nM  *  • 

Bv  Lemma  3.2.S 


'l,n  1/0doR  n)V:)  a.s.  , 


and  by  A2, 


'Vn^""  1/(,(loK  n)“  ->•  0  , 


so  that 


l.n  2,n 


V’  mI  I  *  n)  'S 


By  Lewna  3.2.6, 
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| |Y,  -  Y.  | |  -  0  (e*5) 

11  2, n  3,n‘ 1  pv  ir 


op((log  n)_Vi) 


since  cn  =  n  ;uid  hence  en  log  n  ■*  0. 


Now  Y,  ^  is  zero  mean  Gaussian  process  on  [0,11  with  covariance 
funct ion 


(3.2.11)  r(t  j  ,t ,)  -  hY3>n(ti)'‘3(nCt2) 

=  [sn^f(tl)]  S[sn(t2)f(t2)]  S  * 

e  1  y^K(~7 — )K(-~ -)f(x,y)dxdy 

|ylsan  n  n 

(cf.  Section  3.1).  The  integral  on  the  right  hand  side  of  (3.2.11)  may 
he  written  as 


„M*2’ 


{ ly 


san} 


tj-X 

(Y)K(-i— )K( 
1  n 


t,-x 


- 1  .  1 1  X  *2"X 
*  cn  ^  sn^f^Kl  c — — 1dx  * 

n  n 


Thus  the  process  Y^  ^  is  a  Gaussian  process  with  the  same  covariance 
(unction  as  Y.  ,  i.e.,  they  have  the  same  finite  dimensional  distrihu- 

*  «  II 

tions.  Hence  the  asymptotic  distribution  of 


(23  log  n) 


•V 


[A(K)1 


is  the  same  as  that  of 
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(26  loy  n)Lj 


1'urther,  by  Lenina  3.2.7 


"VI  1 

■  "J  • 

■  Vcn> 

=  Op  (dog  n)  '*s)  . 


By  Theorem  3.2.2 


(26  log  n)** 


5,n 1 


(A(K)] 


has  the  desired  limit  distribution,  and  the  theorem  is 


3.2.4  Lemma .  If  M  is  satisfied  and 

g(x)  -  s (x) f (x)  =  /  y2f(x,y)dy 

is  bounded  away  from  zero  on  [0,1],  then 


HYn  '  Yo,nll  ’  »p(C108  n)-^  . 

Proof.  Note 

Yn(t)  •  Y0,n(t)  =  //  yKC—JdZ 

| y| >an  Ln 


O.n1 


s  £-**1 


YK(r~r)d 


|y  I  >an 


— “ 


proved . 


(x,y) 


(x,y)|  | 


so  that 


so 


By  assumption, 


I  |g' 


<  «> 


and  thus  it  suffices  to  prove  that 


ie-*s 


(log  n)  sup 
Ostsl 


n*4.  If  yK(~)dZ  (x, 

|y|>an  n 


(x,y) 


So 


Now 

(3.2.12) 


(log  n)^  //  yK(^)dZ  (x,y) 

n 


|y|>a, 


n 


=  (log  n)  Hne 


n  till  1 


'.lylVV1^ 


■  EViI(lyl>a„)K(If)} 


■  -  V«  • 

1=1 


say,  where  X  -(t),  i  =  l,...,n  are  i.i.d.  wi 
n ,  1 


with 


EVi(t)  =  0 


for  each  tc  [0,11 .  Thus 


(3.2.13) 

and 


EU2 


(3.2.14) 


EX  .(t) 

n ,  l 


.  ,  ,  t-X. 

s  (log  n)(ncn1)EYf  I  (YJK^-— i) 

n  1  {|y|>an)  e- 


n 


s  Mlos  nHn^j'jRYj 


i 


where 


we  note  that  l?n (t )  is  an  element  of  the  space  P[0,1]  of  right  continuous 
functions  with  left  hand  limits  for  each  n,  and  that,  if  we  show  that 
Un  converges  weakly  to  the  zero  element  of  P[0,1),  then  (3.2. 161  will 
follow,  since  ||*||  is  a  continuous  functional  on  0(0,11.  Since  (3.2.151 
implies 

IV'l)’  . Vtk))  *  fi 

in  R  for  distinct  points  tj,t2,. . . ,tfc  of  [0,1],  it  suffices  to  verify  the 
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following  moment  condition  to  show  weak  convergence  of  U  (Billingsley, 
(1968),  Th.  15.6): 

Et|Un(t)  -  VVHW  '  Un(t)l} 
s  B(t:-t1)‘: 


where  B  is  a  constant. 

By  the  Schwarz  inequality, 


Defining 


we  have 


Et|un(t)  -  unu1)|*|un(t2)  -  UnCt)  |  > 

«  (E[Un(t)  -  Un(t j)]:  ♦  E[Un(t2)  -  Un(t)l2)S  . 

Gn(u,s,X)  =  K(H^L)  -  K(^~)  , 
n  n 


{E[un(t)  -  wn 


•  (log  n)  (Kn)'1«liZ1[V1t,|y|N1g(Yl)(^(t,t,,X1) 

-  EVl|y|>a/VV«-VVll2) 

"  *loSn>  fnEn,’1(.l1EtYiI(|y|>» 

i-l  171  n 

■  EViI(|y|>VlYi,Gnlt,,1,Xi)|2' 

s  (log  ill  !  (VjK^lt.^.X,))  . 

i-l  '  n 


Since  K  has  a  bounded  derivative  on  (-A,A],  it  satisfies  a  Lipsehitz 
condition: 
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IK(u)  -  K(s) |  5  Bx  u-s| 
where  is  a  constant.  Thus 

(E[Un(t)  -  Un(t1)]2}15 

s  B^Clog  n)\n3/2|t-t1|(Bf2l  ,  a  jtY,)}1* 

1  1  n 

-  Bj (log  n)V2/2|t-t.|{  /  y2f  (y)d y}"5  . 

H>an 

Applying  the  same  argument  to 

E[u„(t2)  •  Un(t)J2 

yields 

E{lUnCt)  '  Un(tl}  t  l'Jn(t2)  '  UnW  l  > 

s  B2  log  n  en3|t-t1|*|t2-t|  /  y2fy(y)dy 

l/l>an 

<  c(trt2)2 

by  A1  and  using  the  fact  that  tj  s  t  s  t2.  The  moment  condition  is 
therefore  satisfied,  and  the  result  follows.  □ 


Before  going  on  to  Lemma  3.2.5,  we  state  the  useful  integration  by 
parts  formula  for  Riemann-Stieltjes  integrals  on  rectangles  of  ]R2. 

Let  f  and  g  be  two  functions  defined  on  [0,1]  .  If  all  of  the  integrals 
below  exist  and  are  finite,  then  we  have 


(3.2.17) 


1  1 

/  /  f(x,y)dg(x,y) 
0  0 

1 

+  /  f(l,y)dg(l,y) 
0 


1  1 

J  /  g(x,y)df (x,y) 
0  0 
1 

/  f(0,y)dg(0,y) 

0 
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1  1 
♦  /  g(x,l)df(x,l)  -  /  g(x,0)df(x,0)  . 

0  0 

2 

We  note  that  if  g(x,y)  is  a  Wiener  process  on  [0,1]  and  f(x,y)  is  a  meas- 

2  1  1 

urable  function  on  [0,1]  such  that  /  jf(x,y)dg(x,y)  exists,  then 

0  0 

(3.2.17)  remains  valid  provided  the  integrals  on  the  right  hand  side  of 

(3.2.17)  also  exist. 


3.2.5  Lenina.  If  K  is  absolutely  continuous  on  [-A,A]  and  zero  outside 
[-A,A],  then 


I'i.n  -  V2,nl 


'  ‘,(a„£n1/2n'1/6(1°8 


Proof.  First  we  note  that  the  random  pair 


(3.2.18)  (X’.Y1)  =  T(X,Y)  , 

2 

where  T:  IR  -*•  [0,1]  is  defined  by  (3.2.7),  is  jointly  uniformly  dis¬ 
tributed  on  [0,1]2,  X’  and  Y’  are  independent,  and  Zn(T  *(x’,y')), 

0  s  x1,  y's  1,  is  the  empirical  process  of  (X',Y')  (Rosenblatt  (1952)). 
Theorem  3.2.3  thus  applies  to  (X',Y'),  and  we  may  conclude  that 

sup  |B  (x'.y*)  -  Z  (T*1(x* ,y * ) ) I 
Osx'.y'sl 

=  0(n  1/6 (log  n)5//i")  a.s., 

or,  equivalently. 


sup  IB  (T(x,y))  -  Z  (x,y) | 
x.yclR  n  n 

=  d(n‘1/6(log  n)3/2)  a.s. 


(3.2.19) 
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Applying  the  integration  by  parts  formula  (3.2.17),  we  have 


(3.2.20)  /  /  yK(~^)dZ  (x,y) 

\yWn  en  n 

A  an 

s  /  /  yK(u)dZ  (t-e  u,y) 

u=-A  y=-a  n  n 

n 

A  an 

=  /  /  Zn(t‘enu,y)d[yK(u)l 

-A  -a  n  n 


+  an  /  Zn(t-enu,an)dK(u) 

U**  “A 


A 

+  3n  JA  Zn(t'enu’~an)dK(u) 

n-  a 

+  MA)  /  ydZn(t-enA,y) 

y=*an 

-  k(-A)  /  ydZn(t+enA,y)  . 


The  second  to  last  integral  above  may  be  written,  using  ordinary  one¬ 
dimensional)  integration  by  parts, 


and  similarly  for  the  last  integral  on  the  right  hand  side  of  (3.2.20). 

By  using  a  similar  argument,  we  obtain  (where  the  integrals  are 
defined  in  the  L,  sense) 

u 
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(3.2.21)  ]  j  yK(^)dB  (T(x,y)) 

|y|*an  en 

A  an 

=  /  /  B  (T(t-e  u,y))d[yX(u)] 

u=-A  y=-an 

A 

+  an  /  Bn(T(t-enu,an))dK(u) 


A 

♦  an  /  Bn(T(t-enu,-an))dK(u) 

u=-A 

a 

♦  k(a){  I"  Bn(T(t-enA,y))dy 

y=-an 


+  anBn(T^-enA’an» 


+  anVT^enA>-an»l 


-  K(l-A) |  /  Bn(T(t+enA>y))dy 


y=-a 


n 


+  a_B  fT(t+erA,atJ) 
n  n  n  ’  n 


Subtracting  (3.2.21)  from  (3.2.20)  and  using  (3.2.19)  and  the  assumption 
that  K  is  absolutely  continuous  on  [-A,A],  we  obtain 

(3.2.22)  ^WlV^Ct)  -V2  n(t)| 

=  0(n'1/6(log  n)3/2)  • 

A 

1 4a  /  |K*(u)|du  +  4a  [K(A)  ♦  K( -A) j }  a.s. 
n  -A  n 

=  (?(ann'1/6(log  n)3/2) 


L 


since 


/  |K'(u)|du  <  <*>  . 
-A 


Thus,  since  | | 2 1 |  is  a  bounded  sequence  by  assumption, 


|  Y,  -  Y-,  |  |  =  0  (a  e'V1/6 
1  l,n  2,n' 1  p  n  n 


(log  n)3/2)  , 


and  the  proof  is  complete. 


□ 


We  may  write  the  sequence  of  Brownian  bridges  {B^}  of  Theorem 


3.2.3  as 


(3.2.23)  Bn(x,y)  =  Wn(x,y)  -  xyWn(l,l)  , 


0  £  x,y  s  1,  where  (W^)  is  a  sequence  of  independent  Wiener  processes 
on  [0,1]"  (Rev6sz  (1976)).  The  next  lemma  shows  that,  for  our  purposes, 
the  only  significant  part  of  (3.2.23)  is  Wn(x,y). 


3.2.6  Lemma.  If  A4  holds,  then 


!IY2.„  -  V3,„l[  ■  VEn>- 

Proof.  By  definition  of  Y.,  and  Y,  we  have 
-  '  2,n  3,n 


'Y2,n^  =  Y3,n(t) 


»  I 


[gn(t)]'V4  ff  yK(^)f(x,y)dxdy|.|Wn(l,l) 

|y|*a„  n 


„,t-x 


since  the  Jacobian  of  the  transformation  T  is  f(x,y).  Thus 
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•  sup  e'1  II  |yK(^L)|f(x,y)dxdy 
ustsl  n  |y|sar 


n 


s  |Wn(l,l)|  llg^ll 
•  sup  t'lj  [/  |y|f(x,y)dy]|K(i^)|dx 


n 


u<tsl 


n 


By  A4, 

h(x)  =  /  |y|f(x,y)dy 

.L 

is  a  bounded  function  and  | |gn^| I  is  a  bounded  sequence,  so  that  for 
some  constant  M  we  have 

e'^MY,  -  Y.  || 
n  1  1  2,n  3,n'  1 

s  |Wn(l,l)|%n1  /  |K(^)|dx 
=  |Wn(l,l)|M  /  | K (u) | du 


-  V1)  • 


Thus 


I lY2,n  ‘  V3,J 


’  V# 


and  the  proof  is  complete. 


□ 


3.2.7  Lemma.  Under  the  assumptions  of  Theorem  3.2.1, 


HY4,„  -  YS.nll 
Proof.  By  definition. 


Vs/’- 


(3.2.24)  |Y4>n(t)  •  YSn(t) 


£n’|  /U  Kt}  ^  -UKC^dWCx) 


.Li  A  g  (t-ue  )  , 

=  e  ^  fit  _D _  n  _  1  *S 


En  .1  11  ~ETtT"  1  -»«u)dW(fucn) 


B>'  using  integration  by  parts  and  the  assumptions  that  g^  and  K  are 
absolutely  continuous,  we  may  bound  the  integral  on  the  extreme  right 
hand  side  of  (3.2.24)  with 

(3.2.25)  cn‘s  /V uen)  £  (  ((  -l]K(u)(Ju 


gft-Ae)  , 

“n  1  K(A)W(t-Acn)f[  ^(t)n  p  -l}. 


,  g_(t+Ae  ) 

*  ennct-A)W(t.Aen)([  A;(tr  1  -1) 


■  *  J2.„(«  *  J),nW  • 

say.  We  will  show  that  the  supremum  over  [0,1]  of  each  of  these  three 
terms  in  t^lus  completing  the  proof. 

First  of  all,  note 

e-,a|  |J_  || 

n  1 1  2,n' 1 

s  K(A)  sup  |  W(t-Ae  )  |  sup  e'1  [  ^ ^  J^-l  I 
Ostsl  n  n<- 4-^-1  n  g_(t)  J  | 


Ostsl 


sup  |W(t-Ae  ) |  =  0(1)  . 
Ostsl  n  P 


Now 


b6 


sup  e 
IHtsl 


-1 

n 


^n(t'Acn) 

«n(t) 


]  -1 


<up  ,1(VVM^  , 

0SW1  n  |gn(t)|S 


S  HV'Men1  Sup  llRnU-Afi  )]**  -  [^(t)]*1!  . 

n  n  Ostsl 

By  assumption,  |  Ig^l  I  is  a  bounded  sequence,  and  by  the  mean  value 
theorem, 


en1|[*n(t'Aen)]li  '  fftpCt) ]^| 

‘lEii1l8n(t"AEn)  '  sn(t)  I  ‘  lxn(t  ,A)  I  *** 

where  x^lt.A)  is  between  gn(t‘Afn)  and  Kn(t).  Applying  the  mean  value 
theorem  to  nn  yields 


gn^-Aen)  -  Sn^ 


A€n8A(tn(t*A)) 


where  t^(t ,A)  is  between  t  -  Aen  and  t.  Thus 


SSHJ2,nll 

,  A  l^(tn(t,A))| 
s  K(A)  sup  |W(t)|  •  j  sup 
-Astsl  ' 


?  - - JT 

"Ostsl | ( t ,A) |  * 


OpUl 


since  is  uniformly  bounded  and  g^  is  bounded  away  from  zero,  by 


assumption,  ;ind  thus 


i-'i.Ji "  V'n' 


A  similar  argument  shows  that 


l-'j.nll  ’VV  ' 


so  wo  now  consider  .1 


1  ,n* 


Carrying  out  the  differentiation  in  the  integrand  of  .1.  ,  we  have 

1  ,11 


-L 


M.  it) 


1  n  '  1  ,n 


,  1  /  Wit-U,  ){K'(u)[(  -11}  du 

n  -A  *V  ’ 


*n(t-utn>  >s 


1  A  v  (t-ue_)  S  gMt-ur  1 

2  /  W(t-uen)K(u)  (  -yjty~  )  (  )du 


'  'Cl,nO> 

say.  Now  the  non -stochastic  terms  in  the  integrand  of  ^  are  uniformly 
hounded  in  their  argiments  and  in  n,  hy  assumption.  We  therefore  have 


A 


2,tv 


C,  /  | W{t-uc  ) |du  -  0(1) 


-A 


where  (',  is  a  constant.  Tor  ('  ,  apply  the  same  argument  used  in  con 

l  i  ,n 


sidering  d,  n  to  conclude  that 


sup  r 
Ost''  1 


-1 

n 


H..U-U  J 


where  Cj  is  a  constant.  Then 
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A 

|C.  JlsC.  sup  /  |W(t-ue  )K'(u)u|du 
i,n  1  Ostsl  -A  n 


-  ’ 


and  the  proof  is  complete.  □ 

We  now  use  the  results  proved  thus  far  in  showing  that  YQ  n  and 
n  are  sufficiently  close  to  one  another. 

3.2.8  Lemma.  Under  the  assumptions  of  Theorem  3.2.1, 

HVn-V11  -  V»°e  ■ 


Proof.  We  must  show  that 


sup  { 
Ostsl 


[g(t)] 


-h 


[gnW] 


-H 


yK(ili)dZ  (x,y) 
san  n 


=  0p ((log  n)"55)  . 

By  the  preceding  four  lemmas  and  Theorem  3.2.2, 
(log  n)*s[||Yltn||[X(K)]",s  -  dp] 


converges  in  distribution  to  some  random  variable,  and  is  therefore  a 
Op(l)  sequence.  Since,  by  definition, 

=  0((log  n)*5)  , 


we  have 

i  iYi,ni  i =  V(log  n),S)  ’ 

and  since  1 1^*1 1  is  a  bounded  sequence,  we  have 


EE2E 
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sup  e 
Ostsl 


-*s 

n 


//  yK(^)dZ  (x,y) 

l8|san  En  n 


=  Op((log  n)^2)  . 
Thus  it  suffices  to  prove 


(log  n)  |  |gn'2  -  g'^l  |  -  0 
as  n  +  ®.  By  the  mean  value  theorem, 

1*^  '  *'’*!  '  l8n-8|-|hn3/2| 

where  is  between  gn  and  g.  Since  gn  and  g  are  bounded  away  from  zero 
■3/2 

| |hn  ||  is  a  bounded  sequence,  and  since,  by  A3, 


(log  n)| |gn-g| |  -  0  , 


the  result  is  proved.  □ 

* 

Since  n»n(t)  is  an  asymptotically  unbiased  estimator  of  m*(t)  = 
m(t)f(t),  it  is  natural  to  seek  conditions  under  which  Em*(t)  may  be 
replaced  by  m  (t)  in  Theorem  3.2.1.  Define  the  process 

(*)  1 

Yn(t)  -  — * - — — p -  . 

(s(t)f(t)]2 

Then  we  have  the  following  corollary  to  Theorem  3.2.1. 


3.2.9  Corollary.  Suppose  all  the  conditions  of  Theorem  3.2.1  hold 
and  in  addition 

en  =  n'6  ,  1/5  <  6  <  1/2  , 

K  satisfies 


k 


i 
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/  uK(u)du  =  0  , 

/  u2K(u)du  <  « 
and  the  function 

m*(t)  =  m(t)f(t)  =  /  yf(t,y)dy 

has  bounded,  continuous  1st  and  2nd  derivatives.  Then  the  conclusion 
of  Theorem  3.2.1  holds,  with  replacing  Y  . 

Proof.  According  to  the  remark  at  the  beginning  of  the  proof 
of  Theorem  3.2.1,  it  suffices  to  show 

H'n  -  VI  *  y»°8  n)^)  • 

But 

"Yn  -  VI  s  CnE„)*||m*-tt£||.||g-‘*||  . 

By  assumption, 

1 1 I  <  00 

and  we  know  that,  under  the  assumptions  on  m*  and  K, 

I  |m*  -  EmJJl  |  -  0(e2)  . 

Since 

en  =  n'6  ,  6  >  1/5  , 

then 

en(ncn)  S(lo«  h)*1  m  (^n  lo8  n)*5  -  0  , 
and  the  proof  is  complete. 
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Based  on  this  corollary,  we  may  construct  a  confidence  band  for  m(t) , 
0  s  t  s  1  as  follows.  Using  the  asymptotic  distribution,  we  have 

,  sup|Y'(t)| 

Pi  (26  log  n)^[ - 2-r—  -  d  ]  <  C(a)}  «  1-a 

[X(K)P 

where 

C(a)  =  log  2  -  log | log  (1-a) |  . 

Inverting  the  above  expression  in  the  usual  way,  we  obtain  as  a  (1-a) *100° 
confidence  bond  for  m(t): 

(3.2.26)  in  (t)  i  (ncJ^I  f&-  1S[  *  ■UnOO]'*  . 

n  n  HtJ  ^6  log  ^*5 

0  s  t  s  1. 
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and  thus 

(nO  sup 
Ostsl 

■  ^p((log  n)  ls)  +  0((log  n)*5) 

■  ^p((log  n)*4)  . 

Now,  using  the  assumption  that  g(t)  =  s(t)f(t)  is  bounded  away  from 
zero,  the  conclusion  follows.  □ 

We  now  use  the  preceding  lemma  to  show  uniform  consistency  of 

and  m  . 
n 

3.3.2  Theorem.  Under  the  conditions  of  Corollary  3.2.9,  we  have 

(3.3.2)  |  lmn  -  m|  {  =  0p[(log  n)*4^) , 

(3.3.3)  |  |mn  -  m||  =  0p[(log  n^nej  ‘>s]  . 

Proof.  Note  that 

I  \\  •  ml  I  5  I  Kj1!  1*1  |m*  -  m*|  | 

where 

mtt)  =  f(t)m(t)  . 

By  assumption,  f  is  bounded  away  from  zero  on  [0,1],  and  thus 
1 1 f  1 1 1  <  ®  .  An  application  of  Lenina  3.3.1  thus  proves  (3.3.2). 

For  (3.3.3),  note 

*r  -  * 

m  f  -  f  m 
n  n 


m*(t)  -  m(t)f(t) 
(s(t)f(t)]’5 


I  lmn  -  m|  I 
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*  * 

I  1  m  f  -  m  f  .  | 

s  nnnM  , 


*  * 

m  f  -  m  f  | 
n  n  n 


=  A  ♦  B 


say .  Now 


B  =  llm 


=  0p[(log  n)**(nen)",s] 


by  (3.3.2).  Further, 


m 

a  s 

n 


f  -  f 
n 


j~  0p [(log  n)  J(nen)‘"] 


(Bickel  and  Rosenblatt  (1973)).  Since 


111  I 

TH  *  llm*ll  •  f  inf  I f_ (t) | J ' 1 
n1  Ostsl  n 


and  it  is  easily  verified  that  | |m’ 

inf  I f (t) I  >  0  ,  (3.3.3)  follows. 
0<tsl 


*  .  P 


lm  I  I,  inf  |f(t)| 
Ostsl 


4.  AN  EXAMPLE,  FURTHER  RESEARCH 


As  we  noted  in  the  introductory  chapter,  if  the  density  of  X  is 
known,  then  either  the  estimator  m^  or  inn  may  be  used  to  estimate  the 
regression  function.  Here  we  will  summarize  sane  results  given  in 
Chapter  2  which  relate  to  the  relative  performance  of  mn  and  mn  in  this 
case.  We  then  present  an  example  in  which  m^  and  mn  are  computed  from 
a  set  of  simulated  data. 

4.1  The  Estimators  m„  and  in  . 

_ n _ n 

We  first  note  that,  according  to  Theorem  2.3.4,  if  the  density 
function  of  X  has,  say,  an  interval  for  its  support  and  is  non-zero  at 
the  endpoints  of  the  interval,  then  is  a  consistent  estimator  at  the 
endpoints,  whereas  mn  is  not.  The  implication  of  this  for  finite  sample 
sizes  is  that  m^  is  likely  to  display  a  bias  near  the  endpoints  of  the 
X  variable  which  m^  will  not  have. 

According  to  Theorems  2.4.4  and  2.5.2,  under  appropriate  conditions, 
both  n^Cx)  and  inn(x)  have  asymptotic  normal  distributions  with  mean 
m(x)  (for  kernel  type  estimators).  However,  the  sequence  of  scaling 
constants  required  for  unit  asymptotic  variance  differs  for  the  two  est¬ 
imators;  for  m^x)  it  is  (o2(x)  /K2(u)du/(nen)f1(x)}is  and  for  inn(x)  it 
is  (s(x)  /  K2(u)du/(nen)f1(x))ls  .  Since 

o2(x)  =  s(x)  -  m2(x)  s  s(x)  , 

this  indicates  that  mn  may  display  more  dispersion  about  m  for  finite 
sample  sizes  than  n^. 


L 


a 


4.2  An  Example. 


In  order  to  illustrate  the  behavior  of  the  stimators  in  one  specific 
case,  we  have  computed  and  ir^  for  a  set  of  artificial  data.  We  have 
also  computed  the  approximate  confidence  intervals  given  by  (3.2.26)  for 
m,  based  on  mn.  The  results  of  the  computations  are  depicted  in  Figures 
1-6,  and  we  have  also  shown  a  scatterplot  of  the  data  and  the  true  regres¬ 
sion  function  on  each  figure.  The  data  consists  of  n  =  200  points 
(X^,Y^)  chosen  independently  with  ~  U(-3,2)  and 


Y.  «  X?/3  ♦  X?  ♦  2. 

ii  li 


where  is  a  standard  normal  variable  independent  of  X^.  Thus,  for  this 
data 

m(x)  =  x3/3  +  x2  . 

All  calculations  are  for  kernel  type  estimators  with  kernel  function  given 
by  a  standard  normal  density  function,  truncated  at  +_  3  and  normalized 
so  as  to  be  a  probability  density. 

Figures  1  and  2  show  the  estimators  and  n^,  respectively,  with 

-  21  — 
en  =  n  '  and  Figures  3  and  4  show  rr^  and  m^  with  slightly  less  smoothing, 

-  4  — 

=  n  *  .  The  previously  discussed  bias  of  mn  is  evident  at  the  upper 

jndpoint  on  Figures  2  and  4,  although  and  inn  do  not  differ  by  very 

much  at  the  lower  endpoint.  The  difference  in  the  asymptotic  variances 

of  and  mn  does  not  manifest  itself  in  this  example,  although  mn  in 

Figure  4  has  a  slightly  more  variable  appearance  than  in  Figure  3. 

Figures  5  and  6  show  the  approximate  confidence  bands  given  by 

i 

(3.2.26)  for  a  *  .1,  and  en  »  n'*  and  en  =  n  ,  respectively.  The 
confidence  bands  (3.2.26)  are  asymptotically  valid  for  any  subinterval 
of  [-3,2].  In  practice,  however,  one  should  consider  these  confidence 
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bands  to  be  approximately  valid  only  for  intervals  well  within  the 
support  of  X,  since  the  earlier  remarks  on  the  endpoint  bias  of  mn 
apply  to  the  confidence  bands  also.  These  confidence  bands  were 
calculated  using  the  true  conditional  second  moment 

s(t)  *  1  ♦  [t3/3  ♦  t2]2  . 

In  practice,  one  would  use  an  estimator  of  s(t),  e.g.  the  consistent 
estimator 

sn(t)  =  ^nen)1.51YiKat'Xi)/£n)  • 


A  first  step  in  such  a  proof  might  be  to  show  the  equivalence  of  Vn  to 
the  process 

v;(t)  -  ff(t)/fn(t)]Vn(t) 

(in  the  sense  of  | |Vn  -  1 1  =  Op((log  n)  ))  .  Successive  approximations, 

as  in  Theorem  3.2.1  would  lead  eventually  to  the  equivalence  of  to 
the  Wiener  process  of  Theorem  3.2.2,  and  thus  to  the  asymptotic  distri¬ 
bution  of  the  maximum  absolute  deviation  of  V  . 


We  have  not  been  able  to  carry  out  the  technical  details  of  the 
proof  of  such  a  theorem.  However,  if  it  were  to  be  proved,  one  applica¬ 
tion  would  be  a  confidence  band  such  as  (3.2.26),  but  based  on  m 

n 

instead  of  inn,  and  therefore  narrower  since  mn  is  asymptotically  less 
variable  than  m  . 


E  S  T I MR  TED  REGRESSION.  TRUE  REGRESSION.  SCATTER  PLOT 


With 
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